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Abstract 

A Stackelberg game is played between a leader and a. follower. The leader first chooses an action, 
then the follower plays his best response. The goal of the leader is to pick the action that will maximize 
his payoff given the follower’s best response. In this paper we present an approach to solving for the 
leader’s optimal strategy in certain Stackelberg games where the follower’s utility function (and thus the 
subsequent best response of the follower) is unknown. 

Stackelberg games capture, for example, the following interaction between a producer and a con¬ 
sumer. The producer chooses the prices of the goods he produces, and then a consumer chooses to 
buy a utility maximizing bundle of goods. The goal of the seller here is to set prices to maximize his 
profit—his revenue, minus the production cost of the purchased bundle. It is quite natural that the seller 
in this example should not know the buyer’s utility function. However, he does have access to revealed 
preference feedback—he can set prices, and then observe the purchased bundle and his own profit. We 
give algorithms for efficiently solving, in terms of both computational and query complexity, a broad 
class of Stackelberg games in which the follower’s utility function is unknown, using only “revealed 
preference” access to it. This class includes in particular the profit maximization problem, as well as 
the optimal tolling problem in nonatomic congestion games, when the latency functions are unknown. 
Surprisingly, we are able to solve these problems even though the optimization problems are non-convex 
in the leader’s actions. 
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1 Introduction 


Consider the following two natural problems: 

1. Profit Maximization via Revealed Preferences: A retailer, who sells d goods, repeatedly interacts 
with a buyer. In each interaction, the retailer decides how to price the d goods by choosing p G Ml, 
and in response, the buyer purchases the bundle x € M'l that maximizes her utility v(x) — (x,p), 
where v is an unknown concave valuation function. The retailer observes the bundle purchased, and 
therefore his profit, which is (x,p) — c(x), where c is an unknown convex cost function. The retailer 
would like to set prices that maximize his profit after only a polynomial number of interactions with 
the buyer. 

2. Optimal Tolling via Revealed Behavior: A municipal authority administers m roads that form a 
network G = ( V. E). Each road e € E of the network has an unknown latency function i e : M + —>• 
M | which determines the time it takes to traverse the road given a level of congestion. The authority 
has the power to set constant tolls r e G M + on the roads in an attempt to manipulate traffic flow. In 
rounds, the authority sets tolls, and then observes the Nash equilibrium flow induced by the non-atomic 
network congestion game defined by the unknown latency functions and the tolls, together with the 
social cost (average total latency) of the flow. The authority would like to set tolls that minimize the 
social cost after only a polynomial number of rounds. 

Although these problems are quite different, they share at least one important feature—the retailer and 
the municipal authority each wish to optimize an unknown objective function given only query access to 
it. That is, they have the power to choose some set of prices or tolls, and then observe the value of their 
objective function that results from that choice. This kind of problem (alternately called bandit or zeroth 
order optimization) is well-studied, and is well understood in cases in which the unknown objective being 
maximized (resp. minimized) is concave (resp. convex). Unfortunately, the two problems posed above 
share another important feature—when posed as bandit optimization problems, the objective function being 
maximized (resp. minimized) is generally not concave (resp. convex). For the profit maximization problem, 
even simple instances lead to a non concave objective function. 

Example 1. Consider a setting with one good (d = 1). The buyer’s valuation function v(x) = y/x, and the 
retailer’s cost function is c(x) = x. The buyer’s utility for buying x units at price p is y/x — x ■ p. Thus, if 
the price is p, a utility-maximizing buyer will purchase x* (p) = -jK units. The profit of the retailer is then 

Profit (p) = p ■ ®*(p) - c(x*(p)) = ^• 

Unfortunately, this profit function is not concave. 

Since the retailer’s profit function is not concave in the prices, it cannot be optimized efficiently using 
generic methods for concave maximization. This phenomenon persists into higher dimensions, where it 
is not clear how to efficiently maximize the non-concave objective. The welfare objective in the tolling 
problem is also non-convex in the tolls. We give an example in Appendix A. 

Surprisingly, despite this non-convexity, we show that both of these problems can be solved efficiently 
subject to certain mild conditions. More generally, we show how to solve a large family of Stackelberg 
games in which the utility function of the “follower” is unknown. A Stackelberg game is played by a 
leader and a follower. The leader moves first and commits to an action (e.g., setting prices or tolls as in 
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our examples), and then the follower best responds , playing the action that maximizes her utility given the 
leader’s action. The leader’s problem is to find the action that will optimize his objective (e.g., maximizing 
profit, or minimizing social cost as in our examples) after the follower best responds to this action. 

Traditionally, Stackelberg games arc solved assuming that the leader knows the follower’s utility func¬ 
tion, and thus his own utility function. But this assumption is very strong, and in many realistic settings 
the follower’s utility function will be unknown. Our results give general conditions—and several natural 
examples—under which the problem of computing an optimal Stackelberg equilibrium can be solved effi¬ 
ciently with only revealed preferences feedback to the follower’s utility function. 

For clarity of exposition, we first work out our solution in detail for the special case of profit maximiza¬ 
tion from revealed preferences.We then derive and state our general theorem for optimally solving a class of 
Stackelberg games where the follower’s utility is unknown. Finally, we show how to apply the general theo¬ 
rem to other problems, including the optimal tolling problem mentioned above and a natural principal-agent 
problem. 


1.1 Our Results and Techniques 

The main challenge in solving our class of Stackelberg games is that for many natural examples, the leader’s 
objective function is not concave when written as a function of his own action. For instance, in our example, 
the retailer’s profit is not concave as a function of the price he sets. Our first key ingredient is to show that in 
many natural settings, the leader’s objective is concave when written as a function of the follower’s action. 

Consider again the retailer’s profit maximization problem. Recall that if the buyer’s valuation function 
v(x) = sfx, then when she faces a price p, she will buy the bundle x* (p) = 1/4 p 2 . In this simple case, we 
can see that setting a price of p* (x) = l/2y/x will induce the buyer to purchase x units. In principle, we 
can now write the retailer’s profit function as a function of the bundle x. In our example, the retailer’s cost 
function is simply c(x) = x. So, 


Profit(x) = p*(x) ■ x — c{x) 



Written in terms of x, the profit function is concave! As we show, this phenomenon continues in higher 
dimensions, for arbitrary convex cost functions c and for a wide class of concave valuation functions sat¬ 
isfying certain technical conditions, including the well studied families of CES and Cobb-Douglas utility 
functions. 

Thus, if the retailer had access to an oracle for the concave function Profit(x), we could use an algorithm 
for bandit concave optimization to maximize the retailer’s profit. Unfortunately, the retailer does not directly 
get to choose the bundle purchased by the buyer and observe the profit for that bundle: he can only set prices 
and observe the buyer’s chosen bundle x*(p) at those prices, and the resulting profit Profit/.!;* (y/). 

Nevertheless, we have reduced the retailer’s problem to a possibly simpler one. In order to find the 
profit maximizing prices, it suffices to give an algorithm which simulates access to an oracle for Profit/:!;) 
given only the retailer’s query access to x*(p) and Profit/:!;*)//). Specifically, if for a given bundle x, the 
retailer could find prices p such that the buyer’s chosen bundle x*(p) = x, then he could simulate access to 
Profit(.x) by setting prices p and receiving Profit (x*(p)) = Profit(x). 

Our next key ingredient is a “tatonnement-like” procedure that efficiently finds prices that approximately 
induce a target bundle x given only access to x*(p), provided that the buyer’s valuation function is Holder 
continuous and strongly concave on the set of feasible bundles. Specifically, given a target bundle x, our 
procedure finds prices p such that |Profit (x*(p)) — Profit (x)| < s. Thus, we can use our procedure to 
simulate approximate access to the function Profit (x). Our procedure requires only poly(d, 1/e) queries 
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to x*(p). Using recent algorithms for bandit optimization due to Belloni et al. [BLNR15], we can maxi¬ 
mize the retailer’s profits efficiently even with only approximate access to Profit (x). When our algorithms 
receive noiseless feedback, we can improve the dependence on the approximation parameter e to be only 
poly (log 1/e). 

A similar approach can be used to solve the optimal tolling problem assuming the unknown latency func¬ 
tions are convex and strictly increasing. As in the preceding example, the municipal authority’s objective 
function (social cost) is not convex in the tolls, but is convex in the induced flow. Whenever the latency 
function are strictly increasing, the potential function of the routing game is strongly convex, and so we can 
use our tatonnement procedure to find tolls that induce target flows at equilibrium. 

Our results for maximizing profits and optimizing tolls follow from a more general method that allows 
the leader in a large class of continuous action Stackelberg game to iteratively and efficiently maximize his 
objective function while only observing the follower’s response. The class requires the following conditions: 

1. The follower’s utility function is strongly concave in her own actions and linear in the leader’s actions. 

2. The leader’s objective function is concave when written as a function of the follower’s actions. 1 

Finally, we show that our techniques are tolerant to two different kinds of noise. Our techniques work 
even if the follower only approximately maximizes his utility function, which corresponds to bounded, but 
adversarially chosen noise - and also if unbounded, but well behaved (i.e. zero mean and bounded variance) 
noise is introduced into the system. To illustrate this noise tolerance, we show how to solve a simple d- 
dimensional principal-agent problem, in which the principal contracts for the production of d types of goods 
that are produced as a stochastic function of the agent’s actions. 

1.2 Related Work 

There is a very large literature in operations research on solving so-called “bilevel programming” problems, 
which are closely related to Stackelberg games. Similar to a Stackelberg game, the variables in a bilevel 
programming problem are partitioned into two “levels.” The second-level variables are constrained to be the 
optimal solution to some problem defined by the first-level variables. See [CMS05] for a survey of the bilevel 
programming literature. Unlike our work, this literature does not focus substantially on computational issues 
(many of the algorithms are not polynomial time). [KCP10] show that optimally solving certain discrete 
Stackelberg games is NP-hard. Even ignoring computational efficiency, this literature assumes knowledge 
of the objective function of the “follower.” Our work departs significantly from this literature by assuming 
that the leader has no knowledge of the follower’s utility function. 

There are two other works that we are aware of that consider solving Stackelberg games when the 
follower’s utility function is unknown. Letchford, Conitzer, and Munagala [LCM09] give algorithms for 
learning optimal leader strategies with a number of queries that is polynomial in the number of pure strate¬ 
gics of the leader. In our setting, the leader has a continuous and high dimensional action space, and so 
the results of [LCM09] do not apply. Blum, Flaghtalab, and Procaccia [BF1P14] consider the problem of 
learning optimal strategies for the leader in a class of security games. They exploit the structure of security 
games to learn optimal strategies for the leader in a number of queries that is polynomial in the represen¬ 
tation size of the game (despite the fact that the number of pure strategies is exponential). The algorithm 
of [BFIP14] is not computationally efficient - indeed, the problem they are solving is NP-hard. Neither of 
these techniques apply to our setting - and despite the fact that in our setting the leader has a continuous 

'When the leader and follower are instead trying to minimize a cost function, replace “concave” with “convex” in the above. 
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action space (which is exponentially large even under discretization), we are able to give an algorithm with 
both polynomial query complexity and polynomial running time. 

There is also a body of related work related to our main example of profit maximization. Specifically, 
there is a recent line of work on learning to predict from revealed preferences ([BV06, ZR12, BDM + 14]). In 
this line, the goal is to predict buyer behavior, rather than to optimize seller prices. Following these works, 
Amin et al. [ AC D ~ 15] considered how to find profit maximizing pricing from revealed preferences in the 
special case in which the buyer has a linear utility function and a fixed budget. The technique of [ACD + 15] is 
quite specialized to linear utility functions, and does not easily extend to more general utility functions in the 
profit maximization problem, and not to Stackelberg games in general. “Revealed preferences” queries are 
quite similar to demand queries (see e.g. [BN09]). Demand queries are known to be sufficient to find welfare 
optimal allocations, and more generally, to be able to solve separable convex programs whose objective is 
social welfare. In contrast, our optimization problem is non-convex (and so the typical methodology by 
which demand queries are used does not apply), and our objective is not welfare. 

The profit maximization application can be viewed as a dynamic pricing problem in which the seller 
has no knowledge of the buyers utilities. Babaioff et al. [BDKS15] study a version of this problem that 
is incomparable to our setting. On the one hand, [BDKS15] allow for distributions over buyers. On the 
other hand, [BDKS15] is limited to selling a single type of good, whereas our algorithms apply to selling 
bundles of many types of goods. There is also work related to our optimal tolling problem. In an elegant 
paper, Bhaskar et al. [BLSS14] study how one can iteratively find tolls such that a particular target flow 
is an equilibrium of a non-atomic routing game where the latency functions are unknown, which is a sub¬ 
problem we also need to solve in the routing application. Their technique is specialized to routing games, 
and requires that the unknown latency functions have a known simple functional form (linear or low-degree 
convex polynomial). In contrast, our technique works quite generally, and in the special case of routing 
games, does not require the latency functions to satisfy any known functional form (or even be convex). Our 
technique can also be implemented in a noise tolerant way, although at the expense of having a polynomial 
dependence on the approximation parameter, rather than a polylogarithmic dependence (in the absence 
of noise, our method can also be implemented to depend only polylogarithmically on the approximation 
parameter.) 

Finally, our work is related in motivation to a recent line of work designed to study the sample complexity 
of auctions [BBHM08, CRH, HMR14, DHN14, CHN14, BMM15, MR15]. In this line of work, like in 
our work, the goal is to optimize an objective in a game theoretic setting when the designer has no direct 
knowledge of participant’s utility functions. 

2 Preliminaries 

We will denote the set of non-negative real numbers by M + = {x £ IR | x > 0} and the set of positive real 
numbers by R>o = {x € R | x > 0}. For a set C C W l and a norm || • ||, we will use ||Cj| = sup xeC ||x|| 
to denote the diameter of C with respect to the norm || • |j. When the norm is unspecified, || ■ || will denote 
the Euclidean norm || • |j 2 . 

An important concept we use is the interior of a set. In the following, we will use B u to denote the unit 
ball centered at u for any u E 1ft' y . 

Definition 1. For any 5 > 0 and any set C C M. d , the 6-interior Intc g of C is a subset of C such that a 
point x is in the 6-interior Int^ of C if the ball of radius 6 centered at x is contained in C, that is: 

x + 6B 0 = {x + 6y | ||y|| < 1} C C. 
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The interior Intc of C is a subset of C such that a point x is in Intf if there exists some S' > 0 such that x 
is in hit c.t,'■ 

We will also make use of the notions of Holder continuity and Lipschitzness. 

Definition 2. A function f : C —> R is (A, (5)-Holder continuous for some A, (5 > 0 if for any x,y G C, 

\f{x)~ f(y)\ < X\\x-y\f. 

A function f is X-Lipschitz if it is (A, T)-Hdlder continuous. 

2.1 Projected Subgradient Descent 

A key ingredient in our algorithms is the ability to minimize a convex function (or maximize a concave 
function), given access only to the subgradients of the function (i.e. with a so-called “first-order” method). 
For concreteness, in this paper we do so using the projected sub gradient descent algorithm. This algorithm 
has the property that it is noise-tolerant, which is important in some of our applications. However, we 
note that any other noise-tolerant first-order method could be used in place of gradient descent to obtain 
qualitatively similar results. In fact, we show in the appendix that for applications that do not require noise 
tolerance, we can use the Ellipsoid algorithm, which obtains an exponentially better dependence on the 
approximation parameter. Because we strive for generality, in the body of the paper we restrict attention to 
gradient descent. 

Let C C be a compact and convex set that is contained in a Euclidean ball of radius R, centered at 
some point x\ G M' / . Let c : M. d —>• K be a convex “loss function.” Assume that c is also A-Lipschitz—that 
is, | c(x) — c(y) | < A 11 x — y 11 9 . Let Tic denote the projection operator onto C, 

TTc{x) = argmin \\x — y\\. 

yeC 

Projected subgradient descent is an iterative algorithm that starts at x\ G C and iterates the following 
equations 


yt+1 = x t -ygt, where g t G dc{x t ) 
xt+i = Tix{yt+i) 

The algorithm has the following guarantee. 

Theorem 3. The projected subgradient descent algorithm with r) = satisfies 

f 1 \ . RX 

c {ff'fj- , ?iS c{y) + PT 

Alternatively, the algorithm finds a solution within e of optimal after T = ( RX/e ) 2 steps. 

2.2 Strong Convexity 

We will make essential use of strong convexity/concavity of certain functions. 
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Definition 4. Let 0: C —>■ R be a function defined over a convex set C C R f/ . We say 0 is er-strongly convex 
if for every x,y € C, 

0{y) > 0(x) + (V0(x),y - x) + ^ • |l y- x\\ 2 2 . 

We say 0 is er-strongly concave if (—0) is a-strongly convex. 

An extremely useful property of strongly convex functions is that any point in the domain that is close 
to the minimum in objective value is also close to the minimum in Euclidean distance. 

Lemma 5. Let 0: C —» R be a n-strongly convex function, and let x* = argmin xeC 0(x) be the minimizer 
of 0. Then, for any x € C, 

\\x — X*W\ < — ■ (0{x) — 0{x*)). 
a 

Similarly, if 0 is a-strongly concave, and x* = argmax ,,^ . 0{x), then for any x € C, 

\\x-x*\\l < - • {0{x*) - 0{x)). 

a 

2.3 Tools for Zeroth-Order Optimization 

We briefly discuss a useful tool for noisy zeroth-order optimization (also known as bandit optimization) 
by [BLNR15], which will be used as blackbox algorithm in our framework. The important feature we 
require, satisfied by the algorithm from [BLNR15] is that the optimization procedure be able to tolerate a 
small amount of adversarial noise. 

Definition 6. Let C be a convex set in W l . We say that C is well-rounded if there exist r. II > 0 such that 
C C C B$(R) and R/r < 0(s/d), where £> 2 ( 7 ) denotes an £2 ball of radius 7 in R d . 

Let C be a well-rounded convex set in M d and F, f: W l —>• R be functions such that / is convex and F 
satisfies 

sup \F(x) - f(x)\ < e/d, (1) 

xec 

for some e > 0. The function F can be seen as an oracle that gives a noisy evaluation of / at any point in C. 
Belloni et al. [BLNR15] give an algorithm that finds a point x € C that approximately optimizes the convex 
function / and only uses function evaluations of F at points in x 7 C. The set C only needs to be specified 
via a membership oracle that decides if a point x is in C or not. 

Lemma 7 ([BLNR15], Corollary 1). Let C be a well-rounded set in R f/ and f and F be functions that 
satisfy Equation (1). There is an algorithm ZOO (s,C) (short for zeroth-order optimization) that makes 
0(d 4 ' 5 ) calls 2 to F and returns a point x € C such that 

E[/(®)] < min/(y) + e. 
y&C 

Naturally, the algorithm can also be used to approximately maximize a concave function. 

2 The notation O(-) hides the logarithmic dependence on d and 1/e. 
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3 Profit Maximization From Revealed Preferences 


3.1 The Model and Problem Setup 

Consider the problem of maximizing profit from revealed preferences. In this problem, there is a producer, 
who wants to sell a bundle x of d divisible goods to a consumer. The bundles are vectors x € C where 
C C ¥::[ is some set of feasible bundles that we assume is known to both the producer and consumer. 

• The producer has an unknown cost function c : R+ -> R+. He is allowed to set prices p € R'_{ for 
each good, and receives profit 

r(p) = (p,x*(p)} -c(x*(p)), 

where x* (p) is the bundle of goods the consumer purchases at prices p. His goal is to find the profit 
maximizing prices 

p* = argma xr(p). 

p£ 

• The consumer has a valuation function v : —> R + . The valuation function is unknown to the 

producer. The consumer has a quasi-linear utility function u(x,p) = v(x) — (p, x ). Given prices p, 
the consumer will buy the bundle x* (p) € C that maximizes her utility. Thus, 

x*(p) = argma xu(x,p) = argmax (v(x) — (x,p)). 
xGC x&C 

We call x* (p) the induced bundle at prices p. 

In our model, in each time period t the producer will choose prices p t and can observe the resulting 
induced bundle x* (p 1 ) and profit r(p L ). We would like to design an algorithm so that after a polynomial 
number of observations T, the profit r(p T ) is nearly as large as the optimal profit r(p*). 

We will make several assumptions about the functions c and v and the set C. We view these assumptions 
as comparatively mild: 

Assumption 3.1 (Set of Feasible Bundles). The set of feasible bundles C C R'{ is convex and well-rounded. 
It also contains the set (0, l] d C C (the consumer can simultaneously buy at least one unit of each good). 
Also, ||C ||2 < 7 (e.g. when C = (0, \} d , we have 7 = \fd). Lastly, C is downward closed, in the sense that 
for any x € C, there exists some 5 € (0,1) such that 5 x € C (the consumer can always choose buy less of 
each good). 

Assumption 3.2 (Producer’s Cost Function). The producer’s cost function c : 1R'{ —>• R is convex and 
Lipschitz-continuous. 

Assumption 3.3 (Consumer’s Valuation Function). The consumer’s valuation function v : R+ —>• R is non- 
decreasing, Holder-continuous, differentiable and strongly concave over C. For any price vector p G R^, 
the induced bundle x*(p) = argmax xgC . u(x,p) is defined. 

Note that without the assumption that the consumer’s valuation function is concave and that the pro¬ 
ducer’s cost function is convex, even with full information, their corresponding optimization problems 
would not be polynomial time solvable. Our fourth assumption of homogeneity is more restrictive , but 
as we observe, is satisfied by a wide range of economically meaningful valuation functions including CES 
and Cobb-Douglas utilities. Informally, homogeneity is a scale-invariance condition — changing the units 
by which quantities of goods are measured should have a predictable multiplicative effect on the buyer 
valuation functions: 


7 



Definition 8. For k > 0 , a function v : Ml. —y M + is homogeneous of degree k if for every x £ Iftl and for 
every o > 0, 

v(ox) = (J k v{x). 

The function v is simply homogeneous if it is homogeneous of degree k for some k > 0. 

Our fourth assumption is simply that the buyer valuation function is homogeneous of some degree: 
Assumption 3.4. The consumer’s valuation function v is homogeneous. 

3.2 An Overview of Our Solution 

We present our solution in three main steps: 

1. First, we show that the profit function can be expressed as a concave function r(x) of the consumer’s 
induced bundle x, rather than as a (non-concave) function of the prices. 

2. Next, we show that for a given candidate bundle x, we can iteratively find prices p such that x ~ x* (p). 
That is, in each time period s we can set prices p s and observe the purchased bundle x* (p s ), and after 
a polynomial number of time periods S, we are guaranteed to find prices p = p s such that x* (p) ss x. 
Once we have found such prices, we can observe the profit r(x*(p)) ~ r(x), which allows us to 
simulate query access to r(x). 

3. Finally, we use our simulated query access to r(x) as feedback to a bandit concave optimization 
algorithm, which iteratively queries bundles x , and quickly converges to the profit maximizing bundle. 

3.3 Expressing Profit as a Function of the Bundle 

First, we cany out Step 1 above and demonstrate how to rewrite the profit function as a function of the 
bundle x, rather than as a function of the prices p. Note that for any given bundle x £ C, there might be 
multiple price vectors that induce x. We denote the set of price vectors that induce x by: 

P*(x) = {peR d \x*{p) = x}. 

We then define the profit of a bundle x to be 

r(x) = max r(p) = max (p, x) — c(x). 
p£P*(x) p£P*(x) 

Observe that the profit maximizing price vector p £ P* (x) is the price vector that maximizes revenue 
(p , x), since the cost c(x) depends only on x, and so is the same for every p £ P* (x). The following lemma 
characterizes the revenue maximizing price vector that induces any fixed bundle x £ C. 

Lemma 9. Let x £ C be a bundle, and P* (x) be the set of price vectors that induce bundle x. Then the 
price vector p = V( ! (.x) is the revenue maximizing price vector that induces x. That is, F7v(x) £ P*(x) and 
for any price vector p' £ P*(x), (p 1 ,x) < (Vv(x),x). 

Proof Observe that for any x £ C the gradient of the consumer’s utility uix,p) = v(x) — (p,x) with 
respect to x is (Vv — p). If the prices are p = Vv(x), then since v is concave and Vv(x) — p = 0, x is a 
maximizer of the consumer’s utility function. Thus, we have x*(Vv(x)) = x, and so V?;(.x) £ P*(x). 


Suppose that there exists another price vector p' E P*(x) such that p' f Vv(x). Since the function 
u(-,p') is concave in x and x E arg max xe c u{x, p'), we know that for any x' E C 

(\7v(x) — p', x' — x) < 0, 

otherwise there is a feasible ascent direction, which contradicts the assumption that x maximizes u(x. pi). 
By 3.1, we know there exists some <5 < 1 such that Sx € C. Now consider x' = 5x, then it follows that 

(Vv(x) — p 1 , (1 — S)x) = (1 — 5) ((Vv(x),x) — (p',x)) > 0. 

Therefore, (p', x) < (Vv(x),x), as desired. This completes the proof. □ 

With this characterization of the revenue maximizing price vector, we can then rewrite the profit as a 
function of x in closed form for any x E C: 

r(x) = (Vv(x),x) — c(x). (2) 

Next, we show that r(x) is a concave function of x whenever the valuation v satisfies 3.3 (concavity 
and differentiability) and 3.4 (homogeneity). 

Theorem 10. If the consumer’s valuation function v is differentiable, homogeneous, and concave over C, 
the producer’s profit function r(x) = (Vv(x),x) — c(x) is concave over the domain C. 

To prove this result, we invoke Euler’s theorem for homogeneous functions: 

Theorem 11 (Euler’s Theorem for Homogeneous Functions). Let v : C M + be continuous and differen¬ 
tiable. Then v is homogeneous of degree k if and only if 

{Vv(x),x) = k ■ v(x). 


Proof of Theorem 10. Recall that: 

r(x) = (Vv(x),x) — c(x) 

By the assumption that v is continuous, differentiable, and homogeneous of some degree k > 0, we have by 
Euler’s theorem that 

r(x) = kv{x) — c(x) 

Because by assumption, v(x) is concave, and cix) is convex, we conclude that r(x) is concave. □ 

Finally, we note that many important and well studied classes of valuation functions satisfy our assump¬ 
tions - namely differentiability, strong concavity and homogeneity. Two classes of interest include 

• Constant Elasticity of Substitution (CES). Valuation functions of the form: 

/ d 

v ( x ) = \ ^2 a i x i 
\i= 1 

where > 0 for every i E [d] and p, (3 > 0 such that p < 1 and dp < 1. These functions are known 
to be differentiable, Holder continuous and strongly concave over the set (0, H] d (see Appendix B.l 
for a proof). Observe that v(ax) = (Yli=i a i( ax i ) p ) /3 = aP ^{Yli=i a i x i)^ = & pl3 v(x), so these 
functions arc homogeneous of degree k = pfi. 
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• Cobb-Douglas. These are valuation functions of the form 

d 

v(x) = IJzfS 
i =1 

where cti > 0 for every i € [d] and Yli=i a i < 1- These functions are known to be differentiable, 
Holder continuous and strongly concave over the set (0, H] d (see Appendix B.l for a proof). Observe 

that v(px) = Yli=i( ax i) ai = (Ilf=i a0li )(Yli=i X T) = cr^= l0li ■ v(x), so these functions are 
homogeneous of degree k = Yli=\ a i- 

3.4 Converting Bundles to Prices 

Next, we carry out Step 2 and show how to find prices p to induce a given bundle x. Specifically, the 
producer has a target bundle x € C in mind, and would like to learn a price vector p € such that the 
induced bundle x*(p) is “close” to x. That is, 


x-x*(p )|| 2 < e, 


for some £ > 0 . 

Our solution will actually only allow us to produce a price vector p such that x and x* (p) are “close in 
value.” That is 

| u(x,p) — u(x*(p),p) | < 5. 

However, by strong concavity of the valuation function, this will be enough to guarantee that the actual 
bundle is close to the target bundle. The following is just an elaboration of assumption 3.3: 

Assumption 3.5 (Quantitative version of 3.3). The valuation function v is both 

1. (Aval, f3)-Holder continuous over the domain C with respect to the £2 norm—for all x, x' € C, 

\v(x) - v{x')\ < Aval • II® - z'lla, 
for some constants A va i > 1 and f3 € (0,1], and 

2. a-strongly concave over the interior of C—for all x, x' € C, 

v(x ') < v{x) + {X7v(x),x' — x) — (cr/2) • \\x — x'\\\. 


Our algorithm LearnPricef i , s) is given as Algorithm 1. We will prove: 


Theorem 12. Let x € C be a target bundle and e > 0. Then LearnPricefx, e) outputs a price vector p 
such that the induced bundle satisfies |.x — x* (p)\\ < e and the number of observations it needs is no more 
than 


T = d ■ poly 


1 1 , 

, , 1i Wal 

e a 
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Algorithm 1 Learning the price vector to induce a target bundle: LearnPrice(x, e) 
Input: A target bundle ifC, and target accuracy e 

Initialize: restricted price space V = {p € | ||p|| < \fdV\ where 


L — (A va i) ^ ( £2(T 


(i -P)/P 


p) = 0 for all good j €= [d] 


T = 


32dL 2 7 2 

£ 4 <7 2 


V = 


v/2 7 

L\fdT 


For t = 1,..., T: 

Observe the purchased bundle by the consumer x*(p t ) 

Update price vector with projected subgradient descent: 

Pj +1 = p l j — rj (Xj — x*(p t )j ) for each j € [d], p t+1 = II-p [j^ +1 ] 


Output: p = 1 /Tj^^iP*- 


To analyze LearnPrice(x, c), we will start by defining the following convex program whose solution is 
the target bundle x. 



max v(x) 

(3) 


xGC ' 

such that 

Xj < Xj for every good j € [d] 

(4) 


Since v is non-decreasing, it is not hard to see that x is the optimal solution. The partial Lagrangian of 
this program is defined as follows, 


d 

C(x,p) = v(x) - Y^Pj^i -Xj)’ 

3 =1 

where p 3 is the dual variable for each constraint (4) and is interpreted as the price of good j. By strong 
duality, we know that there is a value OPT such that 


max min £(x,p) = min ma xC(x,p) = OPT = v(x). (5) 

xec peR d pe R d + x&c 

We know that OPT = v(x) because x is the optimal solution to (3)-(4). 

We can also define the Lagrange dual function g: R d —> R to be 


g(p) = ma xC(x,p). 
xeC 

We will show that an approximately optimal price vector for g approximately induces the target bundle 
x, and that LearnPricef x, c) is using projected subgradient descent to find such a solution to g. In order 
to reason about the convergence rate of the algorithm, we restrict the space of the prices to the following 
bounded set: 

V = l p € R d + | ||p|| 2 < Vd (A val ) 1//3 \ . (6) 

First, we can show that the minimax value of the Lagrangian remains closed to OPT even if we restrict 
the prices to the set V. 
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Lemma 13. There exists a value R-OPT such that 


max min C(x.p) = min max £ (a;, p ) = R-OPT. 

xec p eP pep xec 

Moreover, v(x) < R-OPT < v(x) + 

Proof. Since C and V are both convex and V is also compact, the minimax theorem [Sio58] shows that 
there is a value R-OPT such that 

max min Cix.'p) = min max C (x , p) = R-OPT. (7) 

xec pep pep xec 

Since V C by (5), we have R-OPT > v(x). Thus, we only need to show that R-OPT < v(x) + a, 
where a = e 1 a/ 4. Let (x* , p* ) be a pair of minimax strategies for (7). That is 

x * E argmaxmin£(.T,p) and p * E argminmax£(rc,p) 

XEC P^P pEP X&C 

It suffices to show that £(x*,p*) < v(x) + a. Suppose not, then we have 

v(x) + a < C(x',p*) = min£(x*,p) = v(x') — rna x(p,x* — x) < v(x m ). 

pEP pEP 

Now consider the bundle y such that yj = max{.:c*. Xj } for each j E [d]. It is clear that v(y) > 

v(x m ) > v(x). Let L = (Avai) 1 ^ ^ , then we can construct the following price vector p' € V 

such that p'j = L for each good j with x* > Xj, and p'- = 0 for all other goods. Since we assume that v is 
(Aval, /3)-Holder continuous with respect to l 2 norm, we have 

v(x*) - v(x) < v(y) - v(x) < A va i11 y - x\\ 2 < AvalII 2 / - x\\i 


It follows that 


v(x) + a < TL(x*,p*) < £(x',p') 

= v(x m ) — (p',x* — x) 

= v{x m )~ L( yV-Xj) 

j:x’>Xj 

= v(x m ) - L\\y - a;||i < v(y) - L\\y - x\\ 2 

Suppose that ||y — x ||2 > 1 or (3 = 1, we know that ||y — < ||y — x|| 2 - This means v(x) + a < 

v(y) — L||y — x \\2 < v(y) — A va i||y — ^llf < v(x), a contradiction. 

Next suppose that ||y — x || 2 < 1 and /? E (0,1). We also have that 


a < v(y) - v(x) - L||y - x|| 2 


< Availly - x\\ 2 

< Aval ||y - x\\ 2 


— L\\y — x| 



— X 


1-/3 

2 
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Since a > 0, it must be that^l — yfi\\y — x\\\ ^ is also positive, and so ||y — x \\2 < ^ -By 

the choice of our L, 


a < Aval 


(¥T* 



which is a contradiction. Therefore, the minimax value of (7) is no more than v(x ) + a. □ 

The preceding lemma shows that x is a primal optimal solution (even when prices are restricted). There¬ 
fore, if p = argmin pg -p g(p) are the prices that minimize the Lagrangian dual, we must have that x = x*(j>) 
is the induced bundle at prices p. The next lemma shows that if pf are prices that approximately minimize 
the Lagrangian dual, then the induced bundle x*(p') is close to x. 

Lemma 14. Let p' e V be a price vector such that g{p') < min pg -p g{jp) + a. Let x' = x* (p 1 ) be the 
induced bundle at prices p'. Then x' satisfies 

\\x' — x|| < 2\Ja/a. 


Proof. Let R-OPT denote the Lagrangian value when we restrict the price space to V. From Lemma 13, we 
have that R-OPT = min pe -p g(jp) £ [v(fie), v (fit) + a]. By assumption, we also have 

g(p') = C(x',p') < R-OPT + a < v(x) + 2a. 

Note that C(x,p') = v(x) — (p 1 . x — x) = v(x) and x' is the maximizer for so it follows that 

0 < jC(x',p') — C{x,p) < 2a. 

Since we know that v is a cr-strongly concave function over C, the utility function u(-,p') = v(-) — (p 1 , ■) is 
also cr-strongly concave over C? Then we have the following by Lemma 5 and the above argument, 

2 a > C(x',p) — C(x,p) = u(x',p) — u(x,p) > -^\W ~ A\ 2 (8) 

This means \\x' — x\\ < 2^/a/o. □ 

Based on Lemma 14, we can reduce the problem of finding the appropriate prices to induce the target 
bundle to finding the approximate optimal solution to argmin pg -p g(p). Even though the function g is 
unknown to the producer (because v is unknown), we can still approximately optimize the function using 
projected subgradient descent if we are provided access to subgradients of g. The next lemma shows that 
the bundle x* (p) purchased by the consumer gives a subgradient of the Lagrange dual objective function at 

p. 

Lemma 15. Let p be any price vector, and x* (p) be the induced bundle. Then 

(x — x*(p)) € dg(p). 

3 If /(•) is a cr-strongly concave function over C and g(-) is a concave function over C, then (/ + <?)(•) is a cr-strongly concave 
function over C. 
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Proof. Given x' = argmax xg [ 0 n,/ C(x, p), we know by the envelope theorem that a subgradient of g can 
be obtained as follows 

—— = Xj — x'a for each j € \d}. 
dpj 1 3 L J 

Note that x' coiTesponds to the induced bundle of p because 

x = argmax £(x,p) 
xGC 

= argrnax [x(x) — (p, x — x)] 
xeC 

= argmax [x(x) — ( p , x)] 
xeC 

= argmax u(x,p) = x*(p) 
xGC 


Therefore, the vector (x — x* (pi)) is a subgradient of g at the price vector p. □ 

Now that we know the subgradients of the function g at p can be easily obtained from the induced bundle 
purchased by the consumer, it remains to observe that Algorithm LearnPrice(x, e) is performing projected 
gradient descent on the Lagrange dual objective, and to analyze its convergence. 


Proof of Theorem 12. By Lemma 14, it suffices to show that the price vector preturned by projected gradient 
descent satisfies 


e 2 (j 


g(p) < min g(p) + . 

p£V 4 

Note that the set V is contained in the £2 ball centered at 0 with radius L. Also, for each //. the subgradient 
we obtain is bounded: ||x — x*(p*)|| < VPII 2 + ll a: *(T t )ll 2 ^ y/%T since ||C|| < 7. Since we set 


T = 


32dL 2 7 2 
e 4 cr 2 


V = 


y/ 2 7 


Ly/dT 

we can apply the guarantee of projected gradient descent from Theorem 3, which gives: 

. ( v . V2L7 £ 2 (T 

v<£V y/T 4 


By Lemma 14, we know that the resulting bundle x*(p) satisfies that ||x — x* 


< £. 


□ 


Remark 16. Since noise tolerance is not required in this setting, it is possible approximately induce the 
target bundle only using poly-logarithmically in (1/e) number of obseryations. We will give an ellipsoid- 
based variant o/LearnPrice in Appendix D that achieves this guarantee. 


3.5 Profit Maximization 

Finally, we will show how to combine the algorithm LearnPrice with the zeroth order optimization algo¬ 
rithm ZOO to find the approximate protit-maximizing price vector. At a high level, we will use ZOO to 
(approximately) optimize the profit function r over the bundle space and use LearnPrice to (approximately) 
induce the optimal bundle. 
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Before we show how to use ZOO, we will verify that if we run the algorithm LearnPrice to obtain 
prices p that approximately induce the desired bundle x, and observe the revenue generated from prices p, 
we will indeed obtain an approximation to the revenue function r(x). 

Recall from Lemma 9 that the profit function can be written as a function of the bundle 

r(x) = (Vv(x),x) — c(x) 

as long as the producer uses the profit maximizing price vector Vu(x) to induce the bundle x. However, 
the price vector returned by LearnPrice might not be the optimal price vector for the induced bundle. In 
order to have an estimate of the optimal profit for each bundle, we need to guarantee that prices returned 
by LearnPrice are the profit maximizing ones. To do that, we will restrict the bundle space that ZOO is 
optimizing over to be the interior of C. Now we show that for every bundle in the interior of C, there is a 
unique price vector that induces that bundle. Thus, these prices are the profit-maximizing prices inducing 
that bundle. 

Lemma 17. Let x' be a bundle in Into Then X7v(x') is the unique price vector that induces x'. 

Proof. Let // be a price vector such that x*(p r ) = x. Since Intc Q C, we must have 

x 1 = argrnax [n(x) — (p',x)] . 

iSlnt c 

By the definition of Intc, we know that there exists some 5 > 0 such that the ball 8B X > is contained in C. 
Now consider the function /: M. d —» R such that f(x) = u(x, //). It follows that x' is a local optimum 
of / neighborhood 5B X >. Since / is continuously differentiable, we must have V fix’) = 0 by first-order 
conditions. Therefore, we must have 


V/(®') = Vv(x')-p = 0, 


which implies thatp' = Vr(s'). □ 

Instead of using the interior itself, we will use a simple and efficiently computable proxy for the interior 
obtained by slightly shifting and contracting C. 

Claim 18. For any 0 < 6 < 1/2, let the set 

C s = (1-28)C + 51, 

where 1 denotes the d-dimensional vector with 1 in each coordinate. Given 3.1, Cs is contained in the 
(5 /2)-interior of C. That is, Cs C Int^^- 

Proof. Our goal is to show that Cs + 5Bq C C, where Bq denote the unit ball centered at 0. Any point in 
Cs + (5/2)Bq can be written as x' + (5/2) y' for x' E Cs and y' E Bq. We will show that x 1 + (5/2) y' E C. 
Since x' E Cs, there exists x E C such that 

x = (l-25)x + 51. 

Since y' E Bq, there exists y E (0, l] d such that 

\y' = 2y-l. 
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To see this, note that (0, l] d contains a ball of radius 1/4 whose center is (1/2) • 1. By Assumption 3.1, C 
contains (0, l] d , so y £ C. Therefore for some x,y £ C, 

x' + ((5/2) y' = (1 — 2 5)x + <51 + 2<5 y — 51 
= (1 - 25)x + 2<5y, 

'-v-' 

ec 

where we used convexity of C. Hence, x' + {5/2) y' £ C, as desired. □ 

We will let ZOO operate on the set Cs instead of C, and we first want to show that there is little loss 
in profit if we restrict the induced bundle to C$. The following is just a formal, quantitative version of of 
Assumption 3.2: 

Assumption 3.6 (Quantitative version of Assumption 3.2). The producer’s cost function c : E' / —> M is 
X cost -Lipschitz over the domain C with respect to the (.2 norm: for x. x' £ C, 

| c(x) - c(x ')I < A cost ||x - x'\\. 

Given this assumption, the profit function is also Holder continuous. 

Lemma 19. For any x,y £ C such that \\x — y\\ < 1, the following holds 

I r{x) - r(y) \ < (A va i + A cost )||x - y\f ■ 

Proof Recall the revenue component of the profit function is (Vv(x),x). Since v is a concave and homo¬ 
geneous function, we know that the homogeneity degree satisfies k < 1. (See Appendix B for a proof). By 
Euler’s theorem (Theorem 11), 

(Vv(x), x) = k ■ v(x). (9) 

Since v is (A va i, /3)-Holder continuous C, by Equation 9 we know that the revenue (Vv(x),x) is also Aval- 
Holder continuous over C. Furthermore, since the cost function c is A cost -Lipschitz over C, the profit 
function satisfies the following: for any x,y £ C such that |.x — ;//|| < 1, we have 

I r(x) - r(y)\ < |(Vn(x),x) - (Vv(y),y)\ + |c(x) - c{y)\ < A va i||x - y\f + A CO st||a: - y\\ 

Since \\x - y\\ < 1, we know that || x - y\\& > \\x - y\\, so |r(x) -r(y)\ < (A val + A cost )||x - y|| /3 . □ 

We can bound the difference between the optimal profits in Cs and C. 

Lemma 20. For any 0 < <5 < 1 / 37 , 

maxr(x) - maxr(x) < (3<57) /3 (A va i + A cost ). 
xe C xeCs 

Proof Let x* £ arg max xG c r(x). We know that (1 — 25)x* + 51 £ Cs, and 

||s* - (1 - 25)x* - 51\\ < 5\\2x* - 1|| < 35y. 


By Lemma 19, we then have 

r(x*) - r(( 1 - 5)x* + (51) < (3(57)^(A val + A cos t)- 


Furthermore, we also know max xg c 6 r(x) > r((l — 5)x* + (51), so we have shown the bound above. □ 
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Now we focus on how to optimize the profit function r over the set Cs- Recall the algorithm ZOO 
requires approximate evaluations for the profit function r. Such evaluations can be implemented using our 
algorithm LearnPrice: for each bundle x E Cs, run LearnPrice (x, e) to obtain a price vector p such that 
||x — x*(p)\\ < e, and then the resulting profit r(x*(p)) serves as an approximate evaluation for r(x): 

\r{x) - r(x*(p))\ < (A va i + A cos t)^- 


Algorithm 2 Learning the price vector to optimize profit: OprofG', a) 
Input: Feasible bundle space C, and target accuracy a 
Initialize: 


e = min 


a 


X(d + 1 + (12 7 )/3) 



6 — 4£ (X — d,£^ (Aval T A CO st) 


restricted bundle space Cs = (1 — 2 5)C + (51 and number of iterations T = 0(d 4 ' 5 ) 
For t = 1, ..., T: 

ZOO (a/, Cs) queries the profit for bundle x t 

Let p 1 = LearnPrice( /; / . e) and observe the induced bundle x* (p 1 ) 

Send r (x*(p 1 )) to ZOO(a', Cs) as an approximate evaluation of r(x L ) 
x = ZOO(a', Cs) 

p = LearnPrice (x, e) 

Output: the last price vector p 


Theorem 21. Let o: > 0 be the target accuracy. The instantiation OprofC. a) computes a price vector p 
such that the expected profit 

E [r(p)\ > max r(p) — a, 

the number of times it calls the algorithm LearnPrice is bounded by 0(d 4,5 ), and the total observations it 
requires from the consumer is poly(d, 1/a). 4 

Proof. First we show that each induced bundle x*(p t ) is in the interior Intc- Note that in the algorithm, we 
have e = <5/4. By the guarantee of LearnPrice in Theorem 12, we have that 

||a/ — x*(p l )|| < £ = 5/ 4. 

By 18, we know that x t E Int^/ 2 > so the ball of radius e centered at x 1 is contained in C, and hence x* (p L ) 
is in the interior of C. By Lemma 17 and Lemma 9, each vector p L = Vx (x* (p 1 )) is the profit-maximizing 
prices for the induced bundle x*(p t ), so the profit the algorithm observes is indeed r(x*(p t )). 

Next, to establish the accuracy guarantee, we need to bound two sources of error. First, we need to 
bound the error from ZOO. To simplify notation, let A = (A va i + A cos t). Recall from Lemma 19 that the 
approximate profit evaluation r(x*(p t )) satisfies 

|r(x*) — r(x*(p t )) | < XeP. 

4 In Appendix D, we give a variant of the algorithm with query complexity scaling poly-logarithmically in 1/a. 
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By the accuracy guarantee in Lemma 7, the final queried bundle x satisfies 

E[r(x)] > maxr(x) — dXe^. 
xeCs 


Since we know that | r(x) — r(x*(p))\ < Xe, we also have 

E[r(a;*(p))] > rnaxr(x) — (d + l)Ae /3 . 
x&Cs 

Next, as we are restricting the bundle space to Cs, there might be further loss of profit. Note that 6 = 4e < 
I/ 37 , so we can bound it with Lemma 20: 


E[r ( x*(p))j > rnaxr(x) — A 
x£C 


(d + l)eP + (3Sj)P 


= ma xr(x) — A 
xec 


{d+l)e p + (l2 £1 f 


If we plug in our setting for parameter e, we recover the desired bound since r(x* (p)) = r(p) and 
max ieC r(i) = max pgR d r(p). 

Finally, we need to bound the total number of observations the algorithm needs from the consumer. In 
each iteration, the instantiation LearnPrice(x t , e) requires number of observations bounded by according 
to Theorem 12 


T' = d ■ poly ( — ,7, A va i 

£ (7 


Therefore, after plugging in s, we have that the total number of observations Opro needs is bounded by 


0(T' x T) = poly(d, 1 /a) 


(hiding constants A cost , A va i ,a, 7). □ 

4 General Framework of Stackelberg Games 

Now that we have worked out a concrete application of our method in the context of learning to maximize 
revenue from revealed preferences, we will abstract our techniques and show how they can be used to solve a 
general family of Stackelberg games in which the objective of the follower is unknown to the leader. Along 
the way, we will also generalize our technique to operate in a setting in which the follower responds to 
the leaders actions by only approximately maximizing her utility function. In addition to generalizing the 
settings in which our approach applies, this avoids a technical concern that might otherwise arise - that 
bundles maximizing strongly concave utility functions might be non-rational. In addition to being able to 
handle approximations to optimal bundles that would be induced by taking a rational approximation, we 
show our method is robust to much larger errors. 

In our general framework, we consider a Stackelberg game that consists of a leader with action set Aj y 
and a. follower with action set Af- Each player has a utility function Ul,Uf : Al x Ap —» M. In the 
corresponding Stackelberg game, the leader chooses an action p € Al, and then the follower chooses a 
(j-best response x'(p) such that 

U F (p,x'(p)) > U F {p,x*(p)) - C, 

where x*{p) = argmax xe _ /1; , U F (p, x) is the follower’s exact best-response. Note that when £ = 0, x'(p) = 
x*(jp). 

The example of maximizing revenue from revealed preferences is a special case of this framework. The 
producer is the leader and his action space consists of prices p and the follower is the consumer and her 
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action space is the bundle x she purchases. The producer’s utility for a pair (p, x) is his revenue minus the 
cost of producing x and the consumer’s utility is her value for x minus the price she pays. 

In general, we consider solving the leader’s optimization problem—find p £ Al such that Ul(p, x*(p)) 
is (approximately) maximized. Formally, we consider a sub-class of Stackelberg games that have the fol¬ 
lowing structure. 

Definition 22 . An instance is a Stackelberg game S(Al,Af, 4>) which consists of two players—the leader 
and the follower such that: 

• the leader has action set Al C M' :/ , the follower has action set Af f M ,/ , both of which are convex 
and compact; 

• the follower’s utility function Uf'■ Al x„4i ?—takes the fo tin 

Uf(p,x) = f{x ) - (p,x), 

where f : W l -tRis a strongly concave, differentiable function unknown to the leader; 

• the leader’s utility function Up: Al x Af —> IR. is an unknown function. 

The optimization problem associated with the game instance is rnax pG ^, yj(p. x*(p)). 

Our first step to solve the problem is to rewrite the leader’s utility function so that it can be expressed as 
a function only in the follower’s action. For each action of the follower x € Af. the set of leader’s actions 
that induce x is 

P*{x) = {p £ Al | x*(p) = x}. 

Among all of the leader’s actions that induce x, the optimal one is: 

p*(x) = argmaxt/k(p, x), 
p£P*(x) 

where ties are broken arbitrarily. We can then rewrite the leader’s objective as a function of only x: 

ip(x) = U L (p*(x),x). (10) 

Note that to approximately solve the leader’s optimization problem, it is sufficient to find the follower’s 
action x E Af which approximately optimizes 'ipp(-), together with the action p £ Al that approximately 
induces x. Before we present the algorithm, we state the assumptions on the utility functions of the two 
players that we will need. 

Assumption 4.1. The game S(Af,.Ap. cf) satisfies the following properties. 

1. The function iJj: Al —>• M defined in (10) is concave and \L~Lipschitz; 

2. The function <f\ Af —> R is non-decreasing, a-strongly concave and Xp-Lipschitz; 

3. The action space of the leader Al contains the following set 

V = {p £ M+ I ||p|| < VdXp}-, ( 11 ) 

4. The action space of the follower Af has bounded diameter, | A r 11 < 7 . 
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4.1 Inducing a Target Action of the Follower 

We first consider the following sub-problem. Given a target action x of the follower we want to learn an 
action p for the leader such that the induced action satisfies 

||a/(p) — *|| < e. 

We now give an algorithm to learn p that requires only polynomially many observations of the follower’s 
(-approximate best responses. 


Algorithm 3 Learning the leader’s action to induce a target follower’s action: LearnLead (x, e) 
Input: A target follower action x € Af, and target accuracy e 
Initialize: restricted action space V = {p G Rl. | ||p|| < y/dXp} 


p) = 0 for all j € [d] 


(iWMVyV 

^ e 2 (j — 4 ( J 


11 VdX F Vr 


For t = 1,..., T: 

Observe the induced action by the follower x* (p 1 ) 

Update leader’s action: 

j5* +1 = Pj — p ( Xj — x*(p t )j ) for each j € [d], p t+l = lip [i? +1 ] 


Output: p= 


Theorem 23. Let x € Af be a target follower action and e > 0. Then Lea rn Lead (x. e) outputs a leader 
action p such that the induced follower action satisfies ||* — x'(p)\\ < e and the number of observations it 
needs is no more than 


T = 0 


dXp '} 2 \ 
£ 4 (T 2 J 


as long as e > 2^J2Cjo. 


4.2 Optimizing Leader’s Utility 

Now that we know how to approximately induce any action of the follower using LearnLead, we arc ready 
to give an algorithm to optimize the leader’s utility function Up. Recall that we can write the Up as a 
function f that depends only of the follower’s action. In order to obtain the approximately optimal utility 
'ip(x), the leader must play the optimal action p that induces the follower to play approximately x. 

Assumption 4.2. For any x € Af and e > 0, the instantiation LearnLead (x, e) returns p such that 

p = p* (x*(p)). 

Whenever this assumption holds, we can use LearnLead to allow the leader to obtain utility Ul{p, x* (p)) 
{x*{p ))• 
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While 4.2 appears to be quite strong, we can often achieve it. Recall that we were able to satisfy 4.2 
in our revealed preferences application by operating in the interior of the feasible region of the follower’s 
action space, and we can similarly do this in our principal-agent example. Moreover, it is trivially satisfied 
whenever the leader’s objective function depends only on the follower’s action, since in this case, every 
leader-action p which induces a particular follower-action x is optimal. This is the case, for example, in our 
routing games application in Section 5. 

Now we will show how to use the algorithm ZOO to find an approximate optimal point for the function 
ip. First, we will use LearnLead to provide approximate function evaluation for -*/’ at each x £ A/X our 
algorithm first runs LearnLead (x, e ) to learn a price vector p, and we will use the observed function value on 
the induced follower’s approximate best response ip(x'{p)) as an approximation for 'ip(x). Since LearnLead 
guarantees that \\x'{p) — xj < e, by the Lipschitz property of ip we have 

\ip(x) - tp{x'(p))\ < \ L e. 

With these approximate evaluations, ZOO can then find a (r/A/^-approximate optimizer of ip with only 
0(d 4 " 5 ) iterations by Lemma 7. The full algorithm is presented in Algorithm 4. 


Algorithm 4 Leader learn to optimize: LearnOptfA/7, a) 

Input: Follower action space C, and target accuracy a 

Initialize: number of iterations T = 0(n 4 ' 5 ) and e = x L (d+i) 

For t = 1 ,. .., T: 

ZOO(de\L, Af) queries the objective value for action x t £ Af 
Let p t = LearnLead (x*, e) and observe the induced action x' (p 1 ) 

Send 'ip(:i J (p 1 )) to ZOO(dcA/\. Cf) as an approximate evaluation of ip(x f ) 
x = ZOO(de\L, Af) 

p = LearnLead (x, e) 

Output: the leader action p 


Theorem 24. Let a > 0 be the target accuracy. The instantiation LearnOptfAr, a ) computes a leader 
action p along with its induced follower action x*(p) that satisfies 


E[U L (p,x*(p))] > max U L (p,x*{p )) - cc, 
p&A l 


and the number of observations the algorithm requires of the follower is bounded by 



as long as a > Tl(d\L\/(/cr). 

5 Optimal Traffic Routing from Revealed Behavior 

In this section, we give the second main application of our technique discussed in the introduction: how to 
find tolls to induce an approximately optimal flow in a non-atomic traffic routing game when the latency 
functions are unknown. 
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A nonatomic routing game Q(G, £, V) is defined by a graph G = (V, E), latency function £ e on each 
edge e € E, and the source, destination and demands for n commodities: V = {(.s,, t t . )}*£[«;■ The 
latency function l e : M + —» [0,1] represents the delay on each edge e as a function of the total flow on that 
edge. For simplicity, we assume Ed=\ ki = 1’ ar| d we let m denote the number of edges \E\. 

For each commodity i, the demand k t specifies the volume of flow from .Sj to f t routed by (self-interested) 
agents. The game is nonatomic: infinitely many agents each control only an infinitesimal amount of flow 
and each agent of type i selects an action (an Sj-t t path) so as to minimize her total latency. The aggregate 
decisions of the agents induce a multicommodity flow (/*)i e [ n ], with each vector /' = (ff) e eE € T, where 
Ti is the flow polytope for the i’th commodity: 


T,= feKTI E f™= E CVreT\Kf,}, E E fu, Si = ki\ 

^ (v,w)£E (u,v)gE (si,w)£E ( u,Si)(E:E J 

Let T = {/ = i /* | /* € Ti for each i} denote the set of feasible flows. A flow / defines a latency 
£ e (fe) on each edge e. Given a path P, we write £p{f) = E e eP ^e(fe) to denote the sum latency on all 
edges in the path. A Nash or Wardrop equilibrium is defined as follows: 

Definition 25 (Wardrop equilibrium). A multicommodity flow f is a Wardrop equilibrium of a routing game 
if it is feasible and for every commodity i, and for all Si-ti paths P, Q with f l p > 0, we have £p(f) < ^q(/). 

Crucial to our application is the following well known lemma, which states that a Wardrop equilibrium 
can be found as the solution to a optimization problem (convex whenever the latencies are non-decreasing), 
which minimizes a potential function associated with the routing game 

Lemma 26 ([MS96]). A Wardrop equilibrium can be computed by solving the following optimization prob¬ 
lem: 

rfe 

min 3>(/) := J £ e {x) dx 

Whenever the latency functions t e are each non-decreasing, this is a convex program. We call the potential 
function of the routing game. 


Now suppose there is a municipal authority which administers the network and wishes to minimize the 
social cost of the equilibrium flow: 

*(/) = E fe'Ufe)- 

The authority has the power to impose constant tolls on the edges. A toll vector r = {r e ) e& E € M™ induces 
a new latency function on each edge: i T e (/ e ) = £(f e ) + r e , which gives rise to a different routing game 
G{G, l T ■ P) with a new potential function <F r . In particular, the equilibrium flow f*(r) induced by the toll 
vector is the Wardrop equilibrium of the tolled routing game: 


nr) 


argmind> r (/) = argmin 
/eE /eE 


E [ (£e(x)+T e )dx 
.eGE 1 ' 0 


= argmin 
/ 6 E 


*(/) + E T e-/e 

eSE 


While the latency functions are unknown to the authority, his goal is to find a toll vector f such that the 
induced flow f*(f) approximately minimizes the total congestion function T. 
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We can formulate this problem as an instance of the type of Stackelberg game we defined in Defini¬ 
tion 22, where the authority is the leader, and there is a single “flow” player minimizing the game’s potential 
function, serving the role of the follower. We will refer to them as the toll player and the flow player 
respectively. In our setting: 

1. The toll player has action set r G and the flow player has action set T\ 

2. The flow player has a utility function Up: M'[' x T —>• R of the form 

e»(t,/) = -*(/)-<t,/>; 

3. The toll player has a utility function Up : M™ x/-jl of the form 

C/ L (r,/) = -*(/). 

Now we will apply the tools in Section 4 to solve this problem. Before we begin, we will impose the 
following assumptions on the latency functions to match with 4.1. We need two types of assumptions: one 
set to let us find tolls to induce a target flow, and another to guarantee that once we can induce such flows 
(and hence implement a “flow cost oracle”), we can optimize over flows. 

To find tolls to induce a target flow, we require that the potential function 4> be strongly convex in the 
flow variables. The following conditions are sufficient to guarantee this: 

Assumption 5.1. For each edge e G E, i e is differentiable and has derivative bounded away from zero: 
there exists some a > 0 such that for all x G [0,1], £' e (x ) > a. 

Recall that the potential function <3?(x) is a function on m variables (/ e ) e6 £, and it’s Hessian V 2< f> at 
each / G W is a diagonal matrix with entries £' e (f e ) > a. Therefore, we know that V 2< f>(/) >; crI for 
any / G T, and so under Assumption 5.1, $ is a cr-strongly convex function over T. Note that the only 
condition we really require is that the potential function be strongly convex, and there are weaker conditions 
that imply this, but we state Assumption 5.1 because of its simplicity. 

Once we can implement a flow oracle, we need to be able to use a bandit convex optimization algorithm 
to optimize social cost over flows. Hence, we require that the social cost function be convex and Lipschitz. 
The following assumptions are sufficient to guarantee this: 

Assumption 5.2. For each edge e G F, £ e is convex and ( \/m)-Lipschitz continuous over [0,1]. 

Note that this guarantees that T is A-Lipschitz over T. 

We first show that we can use the algorithm LearnLead to learn a toll vector to induce any flow as a 
Wardrop equilibrium. 

Lemma 27. Fix any non-atomic routing game satisfying Assumption 5.L Let f G J- in a target flow and 
e > 0. Then the instantiation LearnLead ( / , e) outputs a toll vector t such that the induced Wardrop 
equilibrium flow f*(f) satisfies \\f — f*(f) || < e, and the number of observations on the flow behavior it 
needs is no more than 
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Proof. Before we apply Theorem 23, we still need to show that the potential function <h of the original 
routing game (without tolls) is Lipschitz over F. Note that this does not require any assumptions on the 
latency functions 4 other than that they are bounded in [0,1]. Let /, g € F, then we can write 



„ / /'/e- 

[9e \ 

Mf)-Hg)\ = 

£(/ 

4 (x)dx— / 4 (x)dx\ 


e WO 

Jo ) 


= £ [ 4 (x)dx 

eG 

< £ max {4 (/e): 4 (,9e)} \fe ~ 9e \ 

e&E 

< ^l/e-5e| < Vm\\f ~ g\\, 

e 

where the last inequality follows from the fact that ||x||i < n/tTL 11.x11 2 for any x G M m . Also, observe that 
each flow vector in F has norm bounded by sfm. Therefore, we know that $ is a yTn-Lipschitz function. 
Then we can instantiate Theorem 23 and obtain the result above. □ 

Now we can instantiate Theorem 24 and show that LearnOpt can find a toll vector that induces the 
approximately optimal flow. 


Pre-processing Step The set F is not a well-rounded convex body in M m (it has zero volume), so we will 
have to apply the following standard pre-processing step to transform it into a well-rounded body. First, 
we find a maximal set X of linearly independent points in F. We will then embed the polytope F into this 
lower-dimensional subspace spanned by X, so that F becomes full-dimensional. In this subspace, X is a 
convex body with a relative interior. Next, we apply the transformation of [LV06] to transform F into a 
well-rounded body within Span(X). 5 We will run ZOO over the transformed body. 

Lemma 28. Let a > 0 be the target accuracy. The instantiation LearnOpt(24/7, a) computes a toll vector 
r such that the induced flow f = f* (r) is a-approximately optimal in expectation: 


E 


T' 


< min \k( f) + a. 
f&tF 


The total number of observations we need on the flow behavior is bounded by 



Remark 29. Just as with the profit maximization example, if we do not require noise tolerance, then we can 
improve the dependence on the approximation parameter a to be polylogarithmic. We show how to do this 
in the appendix. 

3 See Section 5 of [LV06] for details of the rounding algorithm. 
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6 The Principal-Agent Problem 


Our general framework applies even when the leader observes only the noisy feedback that arises when the 
follower only approximately maximizes her utility function. This corresponds to adversarially chosen noise 
of bounded magnitude. In this section, we show how to handle the natural setting in which the noise being 
added need not be bounded, but is well behaved - specifically has mean 0, and bounded variance. This can 
be used to model actual noise in an interaction, rather than a failure to exactly maximize a utility function. 
As a running example as we work out the details, we will discuss a simple principal-agent problem related 
to our profit-maximization example. 

In a principal-agent problem, the principal (the leader) defines a contract by which the agent (the fol¬ 
lower) will be paid, as a function of work produced by the agent. The key property of principal agent 
problems is that the agent is not able to deterministically produce work of a given quality. Instead, the agent 
chooses (and experiences cost as a function of) a level of effort , which stochastically maps to the quality of 
his work. However, the effort chosen by the agent is unobservable to the principal - only the quality of the 
finished product. 

We consider a simple (-/-dimensional principal-agent problem, in which the result of the agent can be 
evaluated along d dimensions, each of which might require a different amount of effort. Since the agent 
knows how effort is stochastically mapped to realizations, we abstract away the agent’s choice of an “effort” 
vector, and instead (without loss of generality) view the agent as choosing a “target contribution” x <£ C C 
¥L d _ - the expected value of the agent’s ultimate contribution. The agent experiences some strongly convex 
cost c(x) for producing a target contribution of x, but might nevertheless be incentivized to produce high 
quality contributions by the contract offered by the principal. However, the contribution that is actually 
realized (and that the principal observes) is a stochastically perturbed version of x: x = x + 9, where 
0 £ M' :/ is a noise vector sampled from the mean-zero Gaussian distribution J\T(0,1). 

The principal wants to optimize over the set of linear contracts: he will choose a price vector p £ W [, 
such that in response to the agent’s realized contribution x, the agent collects reward (p, x) . His goal is to 
choose a price vector to optimize his expected value for the agent’s contribution, minus his own costs. 

The agent’s strongly convex cost function c: C —>• M + is unknown to the principal. If the principal's 
contract vector is p and the agent attempts to contribute x, then his utility is 

U a (p,x) = {p, (X + 9)) - c(x), 

and his expected utility is just u a (p,x ) = E [U a (p,x)\ = (p,x) — c(x). Fixing any price p, the agent will 
attempt to play the induced contribution vector: x* (p) = a,rgmax j; , GC . ( (p. x) — c(x)) in order to optimize 
his expected utility. 

The principal has value v l for each unit of contribution in the i-th dimension, and upon observing the 
realized contribution x, his utility is 

u p (p, x) = (v, x) — (p, x) = (v — p, x). 

The principal’s goal is to find a price vector p to (approximately) maximize his expected utility: 

E [u p (p,x*(p) + 0)] = E [(v — p,x*(p) + 9)] = (v — p,x*(p)). 

This is an instantiation of our class of Stackelberg games in which the principal is the leader with action 
set R+ and utility function ijf p, x) = (v — p,x), and the agent is the follower with action set C and 
utility function 4>(p,x) = (p,x) — c(x). Indeed, in expectation, it is merely a “procurement” version of 
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our profit-maximization example. However, the crucial difference in this application (causing it to deviate 
from the general setting defined in Definition 22) is that the leader only gets to observe a noisy version 
of the follower’s best response at each round: x = x* (p) + 9. We will adapt the analysis from Section 3 
and Section 4 to show that our algorithm is robust to noisy observations. We make the following assumptions, 
which correspond to the set of assumptions we made in our previous applications. 

Assumption 6.1. The following assumptions parallel 4.1 and 3.1. 

1. The set of feasible contributions C C W+ is convex, closed, and bounded. It also contains the unit 
hypercube, [0, l] d C C ( the agent can simultaneously attempt to contribute at least one unit in each 
dimension) and in particular contains 0 € (the agent can contribute nothing). Lastly, | Cj 1 2 < 7 ; 

2. the agent’s cost function c is homogeneous, 1-Lipschitz and n-strongly convex; 

3. the principal's valuation vector has norm j v \ < 1. 

6.1 Inducing the Agent’s Contribution Using Noisy Observations 

We will first show that in general, LearnLead can learn the leader’s action which approximately induces 
any target follower action x even if the algorithm only observes noisy perturbed best responses from the 
follower. This result holds in full generality, but we illustrate it by using the principal-agent problem. 

First, given any target contribution x, consider the following convex program similar to Section 3.4: 

minc(x) ( 12 ) 

xec 

such that Xj > xj for every j € [d] (13) 

The Lagrangian of the program is 

C(x,p) = c(x ) + (p, x — x), 

and the Lagrangian dual objective function is 

g(p) = min £(x,p). 
xec 

By the same analysis used in the proof of Lemma 14, if we find a price vector p € V such that g(p) > 
maxppz-p g(p) — a, then we know that the induced contribution vector x*(p) satisfies \\x*(p) — x\\ < \j2afo. 
Now we show how to (approximately) optimize the function g based on the realized contributions of the 
agent, which correspond to mean-zero perturbations of the agent’s best response. 

As shown in Lemma 15, a subgradient of g at price p is (x*{p) — x), but now since the principal only 
observes the realized contribution vector x, our algorithm does not have access to subgradients. However, we 
can still obtain an unbiased estimate of the subgradient: the vector (x — x) satisfies E \x — x] = (x* (p) — x) 
because the noise vector is drawn from jV(0,1). This is sufficient to allow us to analyze LearnLead as 
stochastic gradient descent. The principal does the following: initialize p 1 = 0 and at each round t € [T], 
observes a realized contribution vector x/ = x* (//) + 9 1 and updates the contract prices as follows: 

p t+1 = Up [p* + r/(x t - x)] , 

where each 9 t ~ A"(0. /), 7 is a learning rate and V = {p G | ||p|| < \f7l)\ Finally, the algorithm 
outputs the average price vector p = 1/T p 1 . We use the following standard theorem about the 

convergence guarantee for stochastic gradient descent (a more general result can be found in [NJLS09]). 


26 



Lemma 30. With probability at least 1 — f3, the average vector p output by stochastic gradient descent 
satisfies 


max g(p) - g(p) < O 

pev 


(-f + Vdlog f ^ ) 


Vt 



Algorithm 5 Learning the price vector from noisy observations: LearnPriceN (x, e, (5) 
Input: A target contribution x G C, target accuracy e, and confidence parameter 3 
Initialize: restricted price space V = {p G | |p| < s/d} 

p) = 0 for all j e[d\ T = 0 ) 


_Vh_ 

VdVr 


For t = 1,..., T: 

Observe the realized contribution by the agent if = x* (p t ) + 6, where 6 AT(0,/) 
Update price vector: 

Pj +1 = p i J + rj (Xj — Xj) for each j G [d], p t+1 = II-p [i^ +1 ] 


Output: p = 


Lemma 31. Let x G C be any target contribution vector. Then, with probability at least 1 — /3, the algorithm 
LearnPriceN (x, e, /3) outputs a contract price vector p for the principal such that the induced contribution 
vector x* (p) satisfies 

||x — X*(p) || < £, 

and the number observations on the realized contributions of the agent it needs is no more than 


T = 0 


dr/_ 

£ 4 < 7 2 


6.2 Optimizing the Principal’s Utility 

Finally, we show how to optimize the principal’s utility by combining LearnPriceN and ZOO. 

Following from the same analysis of Lemma 9, we know that the principal’s utility-maximizing price 
vector to induce expected contribution x is Vc(.x). We can then rewrite the expected utility of the principal 
as a function of the attempted contribution of the agent: 

u p (x) = (v — Vc(x), x). 

Since c is a homogeneous and convex function, by Theorem 10, u v is a concave function. 

Similar to Section 3.5, we will run ZOO to optimize over the interior subset: 

C s = (1-26)C + 81, 

so any price vector p given by LearnPriceN is the unique price that induces the agent’s attempted contri¬ 
bution vector x*(p) (Lemma 17). By the same analysis of Lemma 20, we know that there is little loss in 
principal’s utility by restricting the contribution vectors to Cs- 
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Lemma 32. The function u p : C —> R is 2-Lipschitz, and for any 0 < 5 < 1, 


ma xu n (x) — max u n (x) < 65y. 
x&c p xeC s p 

Now we show how to use LearnPriceN to provide an noisy evaluation for u p at each point of Cf (scale 
of 6 determined in the analysis). For each p the LearnPriceN returns, the realized contribution vector we 
observe is x = x* (p) + 0, so the utility experienced by the principal is 

u p {p,x) = (v-p,x). 

We first demonstrate that u p (p, x) gives an unbiased estimate for u p , and we can obtain an accurate estimate 
by taking the average of a small number realized utilities. In the following, let constant a = In 2/(27 t). 

Lemma 33. Let x' € C he the contribution vector such that p = Vc(s / ) is the unique price vector that 
induces x'. Let noise vectors 0 l ..... 9 s ~ A r (0. 1) and x J = x' + (F for each j € [s]. Then with probability 
at least 1 — /3, 

1 s 

~^2u p (p,x J ) -u p (x') 
s j =1 

Proof Let b = v — p' , then we can write 

-^u p {p,x 3 ) - u p (x') = -^2 - ( b ’ X ')) 

S 3 =1 S 3 = 1 

= -i2( b ,0 j ) 

1 s d 
3=1 ' I 



Note that each f)j is sampled from the Gaussian distribution A r (0.1), and we use the fact that if X ~ 
A/"(0, of) and Y ~ J\f(0, of) then ( bX+cY) ~ A/"(0, ftof+^of)- We can further derive that | J2i=i b $i 
is a random variable with distribution A/"(0, ||6|| 2 /s). Then we will use the following fact about Gaussian 
tails: let Tbea random variable sampled from distribution A/"(0, i 2 ) and a = In 2/(2tt), then for all C > 0 

Pr[|L| > C] < 2 exp (-aC 2 A 2 ) 


It follows that with probability at least 1 — [3, we have 


1 

s 


s d 




< 



Finally, note that we can bound ||6|| = ||n — p'\\ < s/2d, so replacing ||6|| by s/2d recovers our bound. □ 

Now we are ready to give the algorithm to optimize the principal’s utility in Algorithm 6. 
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Algorithm 6 Learning the price vector to optimize under noisy observations: OproNfC. a, 3) 
Input: Feasible bundle space C, target accuracy a, and confidence parameter 3 
Initialize: 

n 2 d In 

£ = 10 ^ 5 = 2e a ' = Me & = ^ 2T s = - T~ 

12y + 3a ae z 

restricted bundle space C$ = { 1 — 25) C + <51 and number of iterations T = ()(d 4r> ) 

For t = 1,..., T: 

ZOOjoL Cs) queries the profit for bundle x 1 
Let p 1 = LearnPriceNlV, e, 3') 

For j = 1,... s\ 

Principal post price p 1 

Let be the realized contribution and experiences utility «..(// . x 3 (?/)) 

Send i nip 1 .53 (p 1 )) to ZOO(a / , Cs) as an approximate evaluation of u p {x t ) 
x = ZOO(o: / , Cs) 

p = LearnPricefx. e) 

Output: the last price vector p 


Theorem 34. Let a > 0 and 0 < /3 < 1/2. With probability at least 1 — 3, the price vector p output by 
OproN(C, a, 3) satisfies 

E [u p (p, x*(p))\ > ma xu p (p,x*(p)) - a, 
p€P 

and the number of observations on realized contributions is bounded by 


6 



Proof. First, by Lemma 31 and union bound, with probability at least 1 — 3/2, we have \\x l — x*(p t )\\ < e 
for all t € [T], We condition on this level of accuracy for the rest of the proof. By the same analy¬ 
sis of Footnote 4, we know that each target contribution x*(p t ) is in the interior Intc, so we have that 

u p (x*{p t )) = u p (p t ,x*(p t )). 

To establish the accuracy guarantee, we need to bound two sources of error. First, we need to bound the 
error from ZOO. Note that the target contribution x* (p 1 ) satisfies 

| Up{x t ) - n p (x*(pb)I < 2 e. 


By Lemma 33 and our setting of s, we have with probability at least 1 — 3' that 






< e. 


By union bound, we know such accuracy holds for all t E [L] with probability at least 1 — 3/2. We condition 
on this level of accuracy, then the average utility provides an accurate evaluation for u p {x l ") at each queried 
point x t 





< 3e. 
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By Lemma 7, we know that the vector x output by ZOO satisfies 

E \u n (x)} > ma xu n (x) — 3 de. 
1 p 1 ~ x&c 5 p 

Finally, by Lemma 32 and the value of e, we also have 


IF (u p [x)\ > max I 
x£C 




Note max xe c u p (x) = ma,x r , e -p u p {p, x* (p)), so we have shown the accuracy guarantee. In each itera¬ 
tion, the algorithm requires O noisy observations for running LearnPriceN and s observations for 

estimating u p (x*(p f )), so the total number of observations is bounded by 


O d 


j4.5 


7 2 d(7 + d ) 4 din + d ) 2 

* n 


a 2 a 4 


cr 


= O 


^5 

a 4 


where we hide constants a, 7 in the last equality. 


□ 


7 Conclusion 

In this paper, we have given algorithms for optimally solving a large class of Stackelberg games in which the 
leader has only “revealed preferences” feedback about the follower’s utility function, with applications both 
to profit maximization from revealed preferences data, and optimal tolling in congestion games. We believe 
this is a very natural model in which to have access to agent utility functions, and that pursuing this line 
of work will be fruitful. There are many interesting directions, but let us highlight one in particular. In our 
profit maximization application, it would be very natural to consider a “Bayesian” version of our problem. 
At each round, the producer sets prices, at which point a new consumer, with valuation function drawn from 
an unknown prior, purchases her utility maximizing bundle. The producer’s goal is to find the prices that 
maximize her expected profit, over draws from the unknown prior. Under what conditions can we solve this 
problem efficiently? The main challenge (and the reason why it likely requires new techniques) is that the 
expected value of the purchased bundle need not maximize any well-behaved utility function, even if each 
individual consumer is maximizing a concave utility function. 
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A A Routing Game Where Social Cost is Not Convex in The Tolls 

As we stated in the introduction, we can give a simple example of a routing game in which the function 
mapping a set of tolls on each of the edges to the social cost of the equilibrium routing in the routing game 
induced by those tolls is not a convex function of the tolls. The example is related to the canonical examples 
of Braess ’ Paradox in routing games. 

Let SC(t\,T 2 ) be the function that maps a pair of tolls for the two A —>• B edges to the social cost 
(excluding the tolls) of the equilibrium routing. For each of the inputs we consider, the equilibrium will be 
unique, so multiplicity of equilibria is irrelevant. 
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Figure 1 : A routing game in which the function mapping tolls to social cost of the unique equilibrium routing is not convex. In 
this example, there are n players each trying to route 1/n units of flow from S to T. There are two edges from A to B with tolls n 
and t 2 , and we assume without loss of generality that all other tolls are fixed to 0. Each edge is labeled with the latency function 
indicating the cost of using that edge when the congestion on that edge is x G [0,1]. Note that the latencies (excluding the tolls) on 
every edge are bounded in [0, 1], 

First, consider the set of tolls r = ly = T 2 = 0. It is not hard to verify that the unique equilibrium is for 
every player to use the route S -A- A —>• B -A T using the A ^ B edge on the right (with latency 0). 6 7 Each 
player will experience a total latency of 1 along their route. Thus ,5(7(r) = 8n/10. 

Now consider the tolls t' in which n = 1, 12 = 2. At these tolls, it is not hard to verify that the unique 
equilibrium is for n/2 players to use S -A- A -A- T and half to use S -A B -A T. 1 Every player experiences 
a total latency of 2/10 + 1/2 = 7/10. Thus SC(t') = 7n/10. 

Finally, consider the convex combination 99r/100 + r'/100 in which n = 1/100 and -ry = 1/50. In 
this case, the unique equilibrium routing will have every player use the route S -A- A -A B -A T but 
using the A -A- B edge on the left (with latency 1/200 and latency-plus-toll 3/200). To see why, observe 
that if a player were at A, then no matter what the other players arc doing, the cheapest path to T is to 
go A —^ B —)■ T using the left edge (note that the right edge has latency-plus-toll 1/50 whereas the left 
edge has latency-plus-toll 3/200). Thus, the cost of going B -A T is exactly 1/2 and the cost of going 
A ^ B ^ T is exactly 1/2 + 3/200. Now, if the player is at S, going S -A- B -A T costs exactly 1, 
whereas going S -A A —>■ B -A- T costs at most 4/10 + 1/2 + 3/200 = 183/200 < 1/2. Thus, every 
player will choose the path S -A- A -A B -A- T, using the left A -A B edge. Every player experiences a 
total latency of exactly 183/200. Thus, SC(t") = 183n/200. 

6 Since the graph is a DAG, we can use backwards induction. From A, it can never cost more to go A —r B —> T than to go 
A —> T. Since one can go from A to B for a cost of 0, players are indifferent about ending up at node A and node B. Since 
S —> A can never cost more than S —> B, and players are indifferent between A and B, every player would choose the path 
S -A A -A B -A T (using the 0 latency path from A to B. 

7 At these tolls, no player will never use either A -A B edge. Thus, they will balance the traffic so that S -A A —>• T and 
S B —> T have equal cost. By symmetry, half will go through A and half through B. 
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But, since 


5(7(997-/100 + t'/ 100) = 


> 


183n 

200 

99 8 n 1 7 n 

100 ’ 10 + Too ' 10 

Wo- sc ^ + Wo- sc ^ 


we can see that the function SC (r) is not convex in r. 


B Missing Proofs in Section 3 

Lemma 35. Suppose function v : lfc' / —>• M + is a concave and homogeneous of some degree k 

k < 1 . 


Proof First, we show that n(0) = 0. To see this, observe that for any b > 1, we can write n(0) 
b k v( 0). For any x E such that i / 0, we have the following due to the concavity of v: 


v(x)/2 


[r>( 0 ) + n(x)] < v(x/2) 



v{x). 


< 0. Then 

v(b, 0) = 


This means that k < I. 


□ 


B.l Properties of CES and Cobb-Douglas Utilities 

In this sub-section, we give proofs showing that both CES and Cobb-Douglas utility functions are strongly 
concave and Holder continuous in the convex region (0, H] d . 


B.1.1 Constant Elasticity of Substitution (CES) 

Consider valuation functions of the form: 

= \J2 aiX i 

\i =1 

where on > 0 for every i E [d] and p, (3 > 0 such that p < 1 and Bp < 1. 

Theorem 36. Let C be a convex set such that there exists some constant that H > 0 such that C C 
(0, H] d . Then v is R-strongly concave over the C for some constant R and is (((max, otfjd , pj3)-Holder 
continuous. 



Proof We will derive the Hessian matrix V 2 u of the function v, and show that there exists some fixed 
R > 0 such that for every x € C and every y E IF'/ we have y J V 2 v(x)y < — //1 y 11 2 . First, we have the 
first partial derivatives 


dv 

dxi 



08 - 1 ) 


pati x 


p-i 


34 



Now we take the second partial derivatives. For any i / j. 


d 2 i 


dxidxj 


/ d \ 0 ~ 2 

3(3 - !) ( X a kX p k J ( patjx £ _1 ) ( 


p— 1 
pa-ix^ 


and for any i. 


/ d \ 0 — 2 / d \ 0 ~ l 

= p (/?-1)f a k x k\ (pa^r 1 ) (pip- 1 ) 


-1 )«i®i 2 )- 


Recall that the zj-th entry of the Hessian matrix is (V 2 v)ij = d 2 v/dxidxj, and we could write 


V J (V 2 n(x)) y = XX 


d 2 v 

dxidxj 


-ywj 


i= 1 j =1 

E C/ \ 

- !li 11 J + X 


<9 2 n X d 2 


dxidxj 


i—1 


d 2 x 


v 2 
y* 


( d \ p ~ 2 

2/3(/3 - l)p 2 X a i a i ( X akX k ) ( X f ’%) 

V/c=i / 

/ d X^ -2 2 / d x/5- 1 

/XX - 1) ( X a * x fc J (ya^r 1 ^) + X ( X akX k] ( p(p - !) 


+ £ 

2—1 


p—2 2 

Vi 


d 


(3-2 


d 


d 


(3-1 


d 


= 3(3 - 1 )p 2 I X a k X k ) ( X akX k lyk ) + Xp(P ~ 1 ) I X akX k ) ( X akX k 2y k 

\k=1 / \fc=l / \fc=l / \fc=l 

We will first consider the case where /3 < 1. Then both of the terms above are non-positive , and 

/ d 


\ 01 ( d 

y J (V 2 v(x))y < -3p( 1 - p) ( X akX k X akX k 


-2 2 
Vk 


\k=1 


\k=1 
(8-1 ' 


< - 


M1 - p) ^ ak < ) \ k ^J akXk ~' 2} ( S 


yl 


\k=1 


k =1 


< ~3p( 1 - P) (X mm{a fe }iT p 2 ||y|| 2 

/ d X" 1 

= —/3p(l - p) f X afc J r nin{a jk }iT p/3 2 ||y|| 2 

Thus, y T (V 2 n(x))y < -i?||y|| 2 for R = j3p( 1 - p) (Xfc=i min fe {a fe }iF Ai3 “ 2 . 
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Now we consider the case where (3 > 1. Since we have assumed that p/3 < 1, we also know that 
(/3 — l)p < 1 — p. Let k = ^Y P ) and we know that 0 < re < 1. It follows that 


/3—2 




\fc=l 

P~ l / d 


y J ( V 2 v(x )) y = f3p( 1 - p) \Ya k x p k j 

/ d \ P ~ 2 r 

- /3p(l- p) ( Y^ a k x k) 

( d 

(Cauchy-Schwarz) < —f3p{ 1 — p){ 1 — re) Y^ a kX k 

\k =l / \fc= 

< -"/'(i - pin - '■ > (ETT J j 

<-Ml-rt(l- K ) t6 mf ec {«fxf- 2 } 

/3p(l - p)(l - re) mm | F p/3 


Y a k x k l Vk ) - « ( £ «fc*fc I ( Y akX k 2 y\ 


\k =1 


^fc=l 


o-2 2 


u - k ) [Y akx k Y akx k ~yi 


\k =1 


E p-2 2 

y k 


< - 


— 2 II „. 112 


This means, y T (V 2 n(x))y < i?||y|| 2 for R = f3p( 1 — p)(l — re) min fc=1 | # p/3 2 - Therefore, we have 
shown that n is i? strongly concave in C for some positive constant R. 

Next, we will show that the function is Holder continuous over C. Let x,y £ C such that a; 7 - y. 
Without loss of generality, assume that v(x) > v(y), and let e* = \x t — y,} for each i € \d\. Then we have 


E 


OliXz 


Y aiy: 


i 1 — 


< “ Vi\ P J 

(maxa t j • (^Y £ ^j 

^max a^j ■ (d\\x - y\\%f 
^max Oi{ ^ -d f -\\x-y\\ 


< 


< 


p/3 

2 


where the first step follows from the sub-additivity. This shows that the function is (((max* ctj)d)^ ■ p/3)- 
Holder continuous over C, which completes the proof. □ 


B.1.2 Cobb-Douglas 

Consider valuation functions of the form 


v(x) = Y[x°‘ i , 


i= 1 


where a* > 0 for every i G [d] and J2i=i a i < 1* 
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Theorem 37. Let C be a convex set such that there exists some constant that H > 0 such that C C (0, H] d . 
Then v is (1, ctfj-Holder continuous and R-strongly concave over C for some constant R. 


Proof Similar to the previous proof, we will show that there exists some constant R > 0 such that for every 
x £ C and every y £ M d , we have y J 'V 2 v(x)y < —R.\\y\\ 2 . First, we could write down the following first 
and second partial derivatives of the function: 


dv 

dxi 


= OljX 


m-ITT ryPt 
x j 


n 


and for any i f j, 


d 2 v 

d 2 Xi 

d 2 v 


= a i (a i -l)x? i 

j¥=i 


x- 


Ot-i — 1 a j~ 1 

-—— = a i x i t otjXg 

OXiOXj J 

J k^i,j 


n 


,ctk 


Let y £ R d , and let k = Y%=i a i ^ (0,1). 


d d 


fV 2 v(x))y = XX 


d 2 v 

Vi "o o Vj 
OXiOXj 


i= 1 3 = 1 

2 En x k k ( a i x V 1 yi)( a J x< j 3 1 yj) + X a i( a i - l ) x T 2 y‘i\\. xC j 

i¥=J i=1 
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| 2^(a i x“ 1 y i )(ajxF 1 y J -) + X“i( a i - 1 )xf 2 y 2 ) 
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X ai ) X aiX i 2 y 2 i 


(Cauchy-Schwarz) < JJ x 

\k= 1 

\k= 1 

< - (II x k k ] C 1 - «) (X a i x i 2 y 2 i 


E —2 2 
Vi 


r&k 

y k 


. \i =1 / \i= 1 

( d - N 
« XI V 

. \i =1 / 


7=1 


E —2 2 


7=1 


v/c=l 


v. 7=1 


< —H^ k0!k 2 ^(1 — /c) min{cKfc}|| 2/|| 2 

k 


Therefore, v is /i-strongly concave in C for R = H^ k <>k 2> (1 — k) min/,, {a:/.}. 
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Next, we will show that the function is (1, JN a,)-Hdldcr continuous. Let x. y £ C such that x f y. 
Without loss generality, assume that v(x) > v(y). We could write 

n^‘-n»r<nk< -»r 

i =1 i =1 z=l 

< IIll x ~ = ll*-I/llF <a< 

2=1 

where the first step follows from the sub-additivity of v. This shows that the function is (1, JA a,;)-Holder 
continuous, which completes the proof. □ 

C Detailed Analysis of Section 4 

Just as in Section 3.4, we start by considering the following convex program associated with x 



max Six) 

xeAp 

(14) 

such that 

Xj < Xj for every j € [d] 

(15) 


The Lagrangian of the program is 


d 

£(x,p) = f(x) ~ %)> 

3 = 1 

where pj is the dual variable for each constraint (15) and the vector p is also interpreted as the action of the 
leader. We can interpret the Lagrangian as the payoff function of a zero-sum game, in which the leader is the 
minimization player and the follower is the maximization player. To guarantee the fast convergence of our 
later use of gradient descent, we will restrict the leader to play actions in Vm as defined in (11). It is known 
that such a zero-sum game has value V: the leader has a minimax strategy p* such that C(x,p*) < V for all 
x € Af, and the follower has a maxmin strategy x* such that C(x* . p) > V for all p G 'P. We first state the 
following lemma about the value of this zero-sum game. 

Lemma 38. The value of the Lagrangian zero-sum game is V = fix), that is 

max min£(x, p) = min max C(x, p) = fix) 
x£A F peV ' p£VxeA F 

Proof. The proof is identical to the proof of Lemma 13. □ 

An approximate minimax equilibrium of a zero-sum game is defined as follows. 

Definition 39. Let a >0. A pair of strategies p E V and x € Af form an a-approximate minimax 
equilibrium if 

C(x,p) > max C(x',p) — a > V — a and C(x,p) < min C(x,p') + a < V + a. 
x’&Af p'GP 

First observe that fixing a strategy x, the maximization player’s best response in this zero-sum game is 
the same as the follower’s best response in the Stackelberg game. 
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Claim 40. Let p' £ V be any action of the leader, then 

arg max C(x,p') = x*(p) 
x&A f 


Proof. We can write 


arg max £{x,p') = arg max \f{x) — (p', x — x)] 

= arg max [f(x) — ( p',x )] 

= arg max Uf(p' ,x) 
xgAf 

The second equality follows from the fact that {//, x) is independent of the choice of x. □ 


Now we show that if a strategy pair (jJ, x') is an approximate minimax equilibrium in the zero-sum 
game, then p' approximately induces the target action x of the follower. Hence, our task reduces to finding 
an approximate minimax equilibrium of the Lagrangian game. 

Lemma 41. Suppose that a pair of strategies (j/, x') € V x Af forms an a-approximate minimax equilib¬ 
rium. Then the induced follower action x*{p') satisfies \\x — x*{p')\\ < yj Aa/a. 

Proof. By the definition of approximate minimax equilibrium, we have 

f(x) — a < £(x',p') < fix) + a 


and also by 40, 

max C(x,p') — a = C(x*(p'),p') — a < C(x',p') < f(x) + a 
x&Af 

Note that 

C(x,p') = fix) — {p', x — x) = fix). 

It follows that £(x*{p'),p') < £{x,p') + 2 a. Since f is a cr-strongly concave function, we know that fixing 
any leader’s action p, C is also a cr-strongly concave function in x. By Lemma 5 and the above argument, 
we have 

2 a > £{x*{p),p) — C(x,p) > x*(p) — x\\ 2 

Hence, we must have || x*(p') — s|| < yjAa/a. □ 


To compute an approximate minimax equilibrium, we will use the following T-round no-regret dy¬ 
namics: the leader plays online gradient descent (a “no-regret” algorithm), while the follower selects an 
(-approximate best response every round. In particular, the leader will produce a sequence of actions 
{/( ,... ,p T } against the follower’s best responses {x 1 ,..., x r } such that for each round t € [T]: 

p t+1 = Hp [p l — q ■ V p £(x t ,p t )] and x l = xfp 1 ). 

At the end of the dynamics, the leader has regret defined as 

T T 

Ul = - -f n ™^2 c ( xt ’P)- 

t =i p ; t =i 

Now take the average actions for both players for this dynamics: x = y Ylt =i :jJ an ^ P = p i V t ■ 
A well-known result by [FS96] shows that the average plays form an approximate minimax equilibrium. 
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Theorem 42 ([FS96]). The average action pair ( p, x) forms a (7 Zl + C )-approximate minimax equilibrium 
of the Lagrangian zero-sum game. 

To simulate the no-regret dynamics, we will have the following T-round dynamics between the leader 
and the follower: in each round t, the leader plays action p 1 based the gradient descent update and observes 
the induced action x*(p t ). The gradient of the Lagrangian V p £ can be easily computed based on the 
observations. Recall from 40 that the follower’s best responses to the leader’s actions are the same in both 
the Stackelberg game and the Lagrangian zero-sum game. This means that the gradient of the Lagrangian 
can be obtained as follows 

V p £(x*(p'),p') = (x - x*(p’)) . 

In the end, the algorithm will output the average play of the leader 1/T i P t - The full description of 
the algorithm LearnLead is presented in Algorithm 3, and before we present the proof of Theorem 23, we 
first include the no-regret guarantee of gradient descent. 

Lemma 43 ([Zin03]). Let V be a closed convex set such that |'D11 < I), and let c l .... . c 1 be a sequence 
of differentiable, convex functions with bounded gradients, that is for every x € D, 11V c t (x) | j 2 < G. Let 
rj = ppjj and uj 1 = 1 [p [0] be arbitrary. Then if we compute w 1 ,... ,oj t € T> based on gradient descent 

u) t+1 = U v [w* — j/Vc(w ( )], the regret satisfies 


n = 


1 \ t/ t\ \ ^ tf \ ^ gd 

— > c (oj ) - mm — > c (uj) < —p= 

u£T> T ^ y x/t 

t =1 t =1 v 


( 16 ) 


Proof of Theorem 23. We will first bound the regret of the leader in the no-regret dynamics. Each vector of 
V has norm bounded by \fd\F- Also, since the gradient of the Lagrangian at point p' is 


V p £(x*(p'),p') 


(x - x*(p')), 


we then know that the gradient is bounded the norm of Af, which is 7. The regret is bounded by 

V2dAi?7 

/VL = - J= -• 

Vt 

Let x = 1/T Ylt =1 x * (p 1 ) denote the average play of the follower. It follows from Theorem 42 that (p, x) 
forms an (IZf + ()-approximate minimax equilibrium, and so by Lemma 41, we have 


Plugging in our choice of T, we get \\x*(p) — .x|| < e/2, and also the total number of observations on the 
follower is also T. Finally, by strong concavity of the function <i> and Lemma 5, we have that 

< e/2. 

By triangle inequality, we could show that the approximate best-response satisfies \\x'{j)) — x\\ < e. □ 
Finally, we give the proof of Theorem 24. 
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Proof of Theorem 24. Since for each x t , the approximate evaluation based on x'(p t ) satisfies \'W(x l ) — 
ip(x'(p t ))\ < X l£, by Lemma 7 we know that the action x output by ZOO(deAi, Af) satisfies 

E [^(ir)] > max 4>(x) — deXL = max Ul{p, x*(p)) — cIeXl- 
x£Af P&Al 

Finally, we will use LearnLead(x, e) to output a leader action p such that \\x*(p) — x|| < s, and so 
i/i(x*(j))) > 'f(x) — Xj€. Therefore, in the end we guarantee that 

E [U L (p,x*(p))\ = E [f(p)} > E [f>(x)\ - X L e > max U L (p,x*(p)) - {d + l)eX L . 


Plugging in our choice of e, we recover the a accuracy guarantee. Note that in each iteration, the number of 
observations on the follower required by the call of LearnLead is 


T = 0 


/ dX 2 F Y 


£ 4 <7 2 


Therefore the total number of observations our algorithm needs is bounded by 

'd 5 - 5 X 2 f7 2 ' 


0{T x T) =0 


£ 4 0 2 


Again plugging in our value of e, the bound on the number of observations needed is 

'd 9 ' 5 A|A 4 7 2 ' 


O 


a 4 o 2 


Hiding the constants (Xp, Xl, 7 and a), we recover the bound above. 


□ 


D Improvement with Ellipsoid in Noiseless Settings 

In this section, we present variants of the LearnPrice and LearnLead algorithms that uses the Ellipsoid 
algorithm as a first-order optimization method. In particular, this will allow us to improve the dependence 
of the query complexity on the target accuracy parameter a. For the technique we give in this section, the 
number of observations of the follower’s actions will have a poly-logarithmic dependence on 1/a instead of 
a polynomial one. We also improve the polynomial dependence on the dimension. 

D.l The Ellipsoid Algorithm 

We will briefly describe the ellipsoid method without going into full detail. Let V c R d be a convex 
body, and f:V—> [-B, B] be a continuous and convex function. Let r, R > 0 be such that the set V is 
contained in an Euclidean ball of radius R and it contains a ball of radius r. The ellipsoid algorithm solves 
the following problem: min pg -p f(p'). 

The algorithm requires access to a separation oracle for V: given any p 6 M d , it either outputs that x is 
a member of V, or if p 0 V then it outputs a separating hyperplane between p and V. It also requires access 
to a first-order oracle: given any p G M. d , it outputs a subgradient w £ df(p). The algorithm maintains an 
ellipsoid E t with center c 1 in W 1 over rounds, and in each round t does the following: 
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1. If the center of the ellipsoid d 0 V, it calls the separation oracle to obtain a separating hyperplane 
u’t £ M f/ such that V C {p: (p — ct) J wt < 0}; otherwise it calls the first-order oracle at ct to obtain 
w t € df(ct). 

2. Obtain a new ellipsoid E t+1 with center c t +1 based on the ellipsoid E t and vector u; /: . (See e.g. [Bubl4] 
for details.) We will treat this ellipsoid update as a black-box step and write it as a function ellipfL. c, w) 
that takes an ellipsoid E along with its center c and also a vector w as input and returns a new ellipsoid 
E' with its center d as output. 

The sequence of ellipsoid centers {cy} produced by the algorithm has the following guarantee. 

Theorem 44 (see e.g. [Bubl4]). For T > 2d 2 log (R/r), the ellipsoid algorithm satisfies {ci,. .., ct}F\V 
0 and 

. . ,, x 2BR ( T \ 

mm t(c) — mm tip) < -exp-. 

ce{c 1 ,...,c T }r\V J y ' pgP ’ ~ r ^ \ 2d 2 ) 

In other words, with at most T = O (d 2 log (/yr) ) calls to the first-order oracles, the ellipsoid algorithm 
finds a point p £ ({c/} n V) that is e-optimal for the function /. 

D.2 Learning Prices with Ellipsoid 

We will first revisit the problem in Section 3.4 and give an ellipsoid-based valiant of LearnPrice that (when 
there is no noise) obtains a better query complexity for LearnPrice. Recall that we are interested in com¬ 
puting a price vector p £ V such that the induced bundle x* (p) is close to the target bundle x. 

Recall from Lemma 14 that it is sufficient to find an approximately optimal solution to the Lagrangian 
dual function g{p) = argma x xeC C(x,p). We will use the ellipsoid algorithm to find such a solution. Note 
that the feasible region for prices is given explicitly: V = {p £ | ||p|| < VdL}, so a separation oracle 

for V is easy to implement. Furthermore, we know from Lemma 15 that for any p, we can obtain a gradient 
based on the buyer’s purchased bundle: (x — x*(p)) £ dg(p). This means we have both the separation oracle 
and first-order oracle necessary for running ellipsoid. The algorithm LearnPE is presented in Algorithm 7. 


Algorithm 7 Learning the price vector to induce a target bundle: LearnPE(.x, e) 

Input: A target bundle x £ Intcs and target accuracy e 

Initialize: restricted price space V = {p £ M'f. | ||p|| < s/dL} 

^ dA va i7 ^ 

For t = 1,..., T: 

while d 0 V then let obtain a separating hyperplane w and let (E f ,d) •<— ellip(E' t , d, w) 
Let p t = d 

Observe the purchased bundle by the consumer x*(p t ) 

Update the ellipsoid (E t+1 , d +1 ) £- ellip(E' t , d, (x — x*(p t ))): 

Output: p = argmin pe {p 1 ,...,p T } x*(p)\\ 


c 1 = 0 E 1 = {p £ R d | ||p|| < VdL} T = 100d 2 In 


L = (Aval) 


1/4 


4 \ (i-4)/4 


e 2 a 
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Theorem 45. Let x E lute f ,e a target bundle and e > 0. Then LearnPEfx'. e) outputs a price vector p 
such that the induced bundle satisfies \\x — x*(p)\\ < e and the number of observations it needs is no more 
than 


T = 0 



(% 2 )) 


Proof. By Lemma 14, it suffices to show that there exists a price vector p E {//}/!, such that 


e 2 a 


g{p) < min g(p) + 

pev 4 


We will show this through the accuracy guarantee of ellipsoid. Note that the set V is contained in a ball of 
radius 2\/dA va j and contains a ball of radius Furthermore, the Lagrangian dual objective value is 

also bounded: for any 


\g(p)\ = I rnaxn(x) — (p,x — x )I 
xeC 

< maxx’(x) 4 - I (p,x*(p) — x)| 
xeC 

< Aval 7 + Ibll • | \x*(p) - X\\ 

< A va i 7 + V2d\ val 7 

< 2x^Aval 7 

By Theorem 44, the following holds 

£^(J 

min g(p) - min g{p') < —. 
p^p 1 p £.V 4 

By Lemma 14, there exists some p E {p ] ,..., p T } such that the resulting bundle x*(p) satisfies that 
||x — x*(p) || < e. Since we are selecting p as p = argmin pe r p i [|x — x*(p) ||, we must have \\x — 
x*(p )|| < e. □ 

Now we could use LearnPE to replace LearnPrice in the algorithm Opro as a sub-routine to induce 
target bundles. The following result follows from the same proof of Footnote 4. 

Theorem 46. Let a > 0 be the target accuracy. If we replace LearnPrice by LearnPE in our instantiation 
o/Opro(C, a), then the output price vector p has expected profit 

E [[r(p)]] > max r(p) — a, 
peRf 

the number of times it calls LearnPE is bounded by (d 4 " 5 • polylog(d, 1/a)), and the observations the 
algorithm requires from the consumer is at most 

d 6 - 5 polylog ( Aval, 7 , —, - 
\ a a 
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Algorithm 8 Learning the toll vector to induce a target flow: LearnTE(/, e) 

Input: A target flow / E T, and target accuracy e 

Initialize: restricted toll space V = {p G M1 | |p| < m} 

c 1 = 0 E 1 = {x € | ||x|| < m} T = 100 (m 2 In ^^ 

For t = 1,..., T: 

while <f 0 V then let obtain a separating hyperplane w and let ( E t ,c t ) <— ellip(L / , c f , tu) 
Let r* = c* 

Observe the induced flow /*(t 4 ) 

Update the ellipsoid (E t+1 , c t+1 ) 4— ellip(£' t , c 4 , —(/ — f*(p t )))'- 
Output: r = argmin rG{T i j t t } ||/ - /*(r)|| 


D.3 Learning Tolls with Ellipsoid 

We will also revisit the problem in Section 5. We give a similar ellipsoid-based algorithm to induce target 
flow. See Algorithm 8. 

Theorem 47. Let f € E be a target bundle and e > 0. Then LearnTE(/, e) outputs a toll vector r such 
that the induced flow satisfies || / — f*(r) || < e and the number of observations it needs is no more than 

T = 0(m 2 Inf— 

V Veer 

Proof. Let function g be defined as 

g(r) = min $(/) + <r, / - /). 



It suffices to show that there exists some t' € {r 1 ,... ,t 7 } such that g{r') > min Te -p g(r) — Aff-. 
Before we instantiate the accuracy theorem of ellipsoid, note that the set V is contained in a ball of radius 
m and contains a ball of radius m/2, and also that the value of gir) is bounded for any r € V: 

\d{r)\ = |min$(/) + (r,/-/)| 

J t*/ - 

< max$(/) + max ||/ - /||||r|| 

<m + y/2mm < 2 Vm 3 


Given that T = 4m 2 In (m/ea), we know by Theorem 44 that 

Fo¬ 
rnax 9{t') — max< 7 (r) > - 

T'e{r 1 ,...,T T } t£V 2 

Therefore, the output toll vector satisfies 

\\f-r(r)\\<8. 


This completes our proof. 


□ 
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Finally, with this convergence bound, we could also improve the result of Lemma 28. 


Theorem 48. Let a > 0 be the target accuracy. If we replace LearnLead by LearnTE in the instantiation 
o/ LearnOpt(24 /r, a), then the output toll vector t and its the induced flow f = f * (r ) is a-approximately 
optimal in expectation: 


E 




< min x i(f) + a. 

fer 


The number of times it calls LearnTE is bounded by (m 4 " 5 • polylog(m, l/o:)), and so the total number of 
observations we need on the flow behavior is bounded by 


m 6 ' 5 polylog ( A, —, — 
a a 


1 1 
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